Ibm HTTP Server - sg245132
Ibm HTTP Server - sg245132
Ibm HTTP Server - sg245132
on RS/6000
Heinz Johner, Jouni Auer, Vitolis Bendinskas, Ng Chang Chyn, Shane Owenby, SunJong Park
http://www.redbooks.ibm.com
SG24-5132-00
SG24-5132-00
March 1999
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix A, “Special Notices” on page 225.
This edition applies to the IBM HTTP Server powered by Apache, Version 1.3.3, as part of the IBM
WebSphere Application Server V2.0, Standard Edition and Advanced Edition, Program Numbers
39L9724 and 39L9063 for use on IBM RS/6000.
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the
information in any way it believes appropriate without incurring any obligation to you.
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix
Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
v
Chapter 7. Performance and Scalability . . . . . . . . . . . . . . . . . . . . . . . 153
7.1 Basic Performance Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.1.1 Link Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.1.2 Hardware and Operating System . . . . . . . . . . . . . . . . . . . . . . . 154
7.1.3 The Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.2 Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.2.1 Hardware and Operating System . . . . . . . . . . . . . . . . . . . . . . . 165
7.2.2 Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.3 Scalability for the IBM HTTP Server . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.3.1 Load Balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.3.2 File Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
vii
viii IBM HTTP Server Powered by Apache on RS/6000
Figures
The reliable operation of Web servers is very important for organizations that
are present on the World Wide Web. The offering of Internet services as a
new business opportunity is largely dependent on Web servers and has
grown dramatically in the past few years.
To the surprise of many software vendors that offer their own products, a
non-commercial, freely available software has become the de-facto standard
for Web servers. Named Apache, this Web server has quickly gained more
than a 50% share in the market and it has proven to be a very versatile and
reliable Web server software. Due to this occurrence, IBM has chosen to
support the development of Apache and make it available to customers with
valuable security enhancements as part of its e-business product suites with
full IBM support.
This redbook gives a broad understanding of the architecture of the new IBM
HTTP Server powered by Apache on RS/6000, and it will help you install,
tailor and configure that software successfully.
The redbook introduces the general concepts of Apache and then describes
its installation and basic configuration. Advanced configuration options are
then described, followed by security implementation and setup information.
Performance and management issues are covered, as well as the
development of new modules that extend the functions of a Web server. This
redbook ends with a discussion of the migration from other Web server
software and of running applications on a Web server.
Marcus Brewer
Tara Campbell
Comments Welcome
Your comments are important to us!
xv
xvi IBM HTTP Server Powered by Apache on RS/6000
Chapter 1. The History of the Apache Server
With the growing contributions to the World Wide Web, Web servers today
have advanced in their capabilities to provide information to the public. They
no longer only provide static information, but also provide dynamic
information based on the requests submitted from clients. Furthermore, data
processing was made available on Web servers, too, which allowed
applications to process input provided by users. Electronic commerce is now
possible over the World Wide Web and users can purchase or sell items with
just a click. Inevitably, the concern for security and reliability of a Web server
becomes an important issue.
The rate at which the Web community grows comes as no surprise, and one
of the reasons is probably the ease of use of Web browsers; the “user
interface” for the World Wide Web. The technical knowledge of the Web
servers, on the other hand, requires more in-depth discussion for someone
who wishes to set up a secure and reliable Web server for his or her
environment, especially when commercial business is involved. This book
discusses the issues involved in setting up the IBM HTTP Server powered by
Apache (in the following simply referred to as the IBM HTTP Server ) on IBM
RS/6000.
This first chapter gives you an overview of how the Apache server was
created, its features and IBM’s participation in the Apache project.
The Apache Group reviewed some of the enhancements and bug fixes and
added them to their own server for testing purposes. It was in April 1995 that
the Apache server made its first public release at Version 0.6.2. This name
was given since it is the “patched” version (A PAtCHy server) of the NCSA
HTTPd 1.3 Web server.
As an open source HTTP server (the term HTTP server is commonly used as
a synonym to Web server throughout the literature), the Apache server
acquired a remarkable result in capturing a substantial part of the Internet
market. According to a Web server survey conducted by Netcraft Ltd. on a
monthly basis, the Apache server has already achieved more than 50% of the
market share (see Table 1, for the Netcraft survey results as of February
Thanks to the Apache Group and their goal to develop and support open
software, webmasters from all over the globe have the opportunity to obtain
the Apache server for free, including all the latest modules and fixes. Despite
the fact that it is free, the Apache server runs on the most widely used
platforms with great performance and extensible features. It also progresses
with the advance development of the HTTP standards, such as the latest
HTTP/1.1 protocol, and remains compatible with HTTP/1.0. An impressive
graphical representation of the survey conducted by Netcraft Ltd. on the
growth of Web servers usage over the past few years can be found at:
http://www.netcraft.com/survey.
As the most popular Web server, there is no doubt about the quality and
features that Apache can offer. Since it was developed with
platform-independence in mind, the Apache server runs traditionally on all
major variants of UNIX, Windows 95 and NT (only after Apache 1.3b3) and
OS/2 Warp. In addition, it also supports the HTTP/1.1 protocol and utilizes
the APIs specific to the operating systems, such as the Internet Server
Application Programming Interface (ISAPI) used on Windows NT. The
modular design (to be covered in 2.2, “The Apache Server Model” on page
14) of the Apache server enables flexibility in customizing the Web server to
any specific environment. Apart from the standard modules that come with
the Apache server, new modules are also developed and contributed by
individuals and organizations to help other webmasters function effectively in
their environment. Besides adding more beneficial modules to further
improve the functionality of the server, any problem discovered can be
reported to the Apache Group and actions will be taken so that the
corresponding fix is readily available in a minimum amount of time. As of
today, this avalanche process of development on the Apache server, known
as the Apache Project, has already created a robust and reliable server as
the user base expands throughout the globe. The Apache Group has set forth
very high standards for code contributions, which results in a high level of
Webcrowler http://www.webcrowler.com
Hotmail http://www.hotmail.com
GeoCities http://www.geocities.com
JavaSoft http://java.sun.com
IBM will ship the Apache HTTP server with the IBM WebSphere Application
Server, helping current Apache users to evolve to e-business solutions. As
part of the WebSphere Application Server package, IBM will provide
commercial, enterprise-level support for the Apache HTTP Server. In
addition, IBM will be a full participant in the Apache HTTP Server Project, a
collaborative development effort, and will make contributions to enhance the
capabilities of the Apache HTTP Server.
What are the benefits for this partnership? With the recognition of the quality
and power of the open source development of the Apache Group, IBM can
assure the delivery of an HTTP server that is exactly what customers want. As
the fastest growing server in the Internet world, the Apache server allows IBM
to follow the HTTP server market with the help of its leading momentum. In
addition to the large installed base and webmasters who are already familiar
with the Apache server, IBM has also a good starting point in creating
awareness of the IBM WebSphere Application Server in the market (see note
below). An overview of how the Apache server fits into the IBM WebSphere
Application Server package is given in 2.4, “WebSphere and Apache” on
page 30.
Conversely, IBM helps the Apache Group to boost the status of a freeware
product into a larger enterprise market. Large corporations which hesitated to
adopt the use of a widely available product with no brand name are now more
assured as IBM joins in the support in the development of the Apache server.
With the additional force of engineers and developers from IBM, the Apache
Group has a stronger development team focussed on providing open-source
software. IBM customers can benefit from having their common support
structure for the IBM HTTP Server (which is IBM’s name for the Apache
server) while non-IBM customers can benefit from quality and function
enhancements brought to the Apache server by IBM and its customers.
Installation for the IBM HTTP Server has been made easier by catering for their
specific platform. System administrators of operating systems such as AIX
simply need to install the “installp” version through SMIT (System Management
Interface Tool) or using the “install shield” for NT environment. Unlike the IBM
HTTP Server, the downloaded Apache server, although available in binary
versions for most operating systems, requires more steps before the relevant
files are extracted for installation and, if a customized version is desired, the
code is compiled and linked.
The IBM HTTP Server comes as a compiled and tested version of the Apache
server for the specific platforms. Users have the convenience of using the
modular structure of the IBM HTTP Server and including or excluding modules to
suit their needs. All configurations pertaining to the IBM HTTP Server are
consolidated into one configuration file (httpd.conf) provided with the product, as
compared to the downloaded Apache server, which provides three files
(httpd.conf, srm.conf and access.conf) to comply with previous versions which
may require more maintenance. Furthermore, as the name implies, no
compilation is needed for the pre-compiled IBM HTTP Server. Installation of the
IBM HTTP Server on RS/6000 is explained in Chapter 3, “Installation and Initial
Setup” on page 33.
Due to U.S. export regulation restrictions, IBM cannot provide the source
code for the IBM HTTP Server because it would expose interfaces to SSL
libraries that are controlled by these regulations. However, webmasters can
still add modules dynamically to the IBM HTTP Server, as we will discuss in
2.2, “The Apache Server Model” on page 14 and Chapter 8, “Building HTTP
Server Modules” on page 177.
Developments made within IBM will go through an internal review before they
are submitted to the Apache Group for voting. Thus the compatibility between
the IBM HTTP Server and the Apache server is maintained. Similarly, any
problems reported by customers pertaining to the core server will be, perhaps
along with some suggested corrections, fed back to the Apache Group for their
necessary actions. However, bugs pertaining to the additional functions that IBM
offered will be dealt with by IBM developers.
For IBM customers, rather than posting the questions and doubts to the
Apache Group or searching the answers through the massive documentation
At present, the Apache Group is looking into these areas of enhancement for
the next release of the Apache server, Version 2.0:
• Multi-Threading — Multi-threaded support involves the use of only one HTTP
daemon handling requests via inter-process communication for the
coordination of operations such as content retrieval. This method, which is
already implemented on the Apache server for the Windows NT, may provide
better overall performance on those platforms where threads are well
implemented. The current version running on UNIX spawns multiple
processes of the HTTP daemon to handle the requests, which might
consume more resources (but provides the advantage that a single process
can abort without affecting the others).
• Better system configuration — At present, the file Configure, which
contains hard-coded definitions for some particular platforms is being
reviewed for better compatibility with other platforms that are not
supported now.
• Stabilized API model — The developers are deriving a more stabilized API
model to prevent disruption in modules development and also to ease the
process of developing new modules. Documentation on the use of the
modules should also be informative for anyone who wants to write or use
the modules more efficiently in their environment.
• Configuration Syntax — The syntax used in languages for the
configuration files are also reviewed to remove any limitations and
ambiguities from the webmasters.
Since IBM is working together with the Apache Group, customers can expect
to see the above enhancements to also be present in the IBM HTTP Server.
There are millions of people accessing the World Wide Web every day. This
is mostly because it requires only basic skills to use a Web browser to surf
the Web. Webmasters, on the other hand, need to know a little more. They
could, of course, simply follow guided instructions and have a Web server up
and running without the need of an in-depth understanding of the Web server
they are using. For those webmasters who want or need more than just
step-by-step checklists, this chapter lists and describes the working
mechanism of the Apache Web server. Since the IBM HTTP Server is basically
identical to the Apache server (with the exception of the added SSL support), the
original Apache server model is selected as a base for the discussion in this
chapter.
In the section that follows, some general features of the Apache server are
explained. In the second section of this chapter, some specialities about the
architecture are discussed. The sections that follow explain all the modules
that come with the Apache server and how the IBM HTTP Server fits in the
IBM WebSphere architecture.
Dynamic Shared Objects (DSO) – The most important feature of the Apache
server is probably the concept of modules. The core Apache server provides
basic Web server functionality while requiring very few resources. As
requirements grow, functionality can easily be added to the Apache server
through modules. There are a lot of modules available for the Apache server.
These modules can either be built into the core Web server or loaded
dynamically. Due to the U.S. government restrictions on the export of SSL
technology, the IBM HTTP Server cannot be shipped in a state to allow modules
to be statically built into the Web sever core. Due to these restrictions, modules
can only be added to the IBM HTTP Server as dynamic shared objects (DSOs).
Virtual Hosts – One of the key features of the Apache server is the support
of virtual hosts (to be further discussed in 5.1, “Virtual Hosts” on page 71). A
single server can serve requests sent to different server names or even IP
addresses. Thus, a single server can appear and behave exactly as if there
were more than one server involved. This feature is very useful for running a
single Web server machine to serve multiple domains.
Content Negotiation – Another key feature of the Apache server is its ability
to perform content negotiation. This feature allows the server to negotiate
with the client before retrieving the most appropriate information based on the
client's settings. In simpler terms, the browser tells the server how capable it
is to accept different kinds of content and the server determines the most
appropriate type of content to be returned to the browser. This is extremely
useful for serving selective contents such as language-specific documents
that are retrieved based on the settings on the browsers. Another common
area of use are requests for information concerning a particular media-type,
such as graphic images (for example GIF or JPEG), where the browser can
inform the server of its preferences. The configurations required on the server
and the browser to support content negotiation is dealt with in 5.4, “Multiple
Language Support” on page 87.
Automatic Index File Selection and Index Creation – With this feature,
webmasters can specify default directory index files (for example index.html)
or the format of automatically created directory indexes to be returned to the
client when an index was requested for a specific directory, for example by
specifying a “/” character as the last character in an URL. The IBM HTTP
Server (Apache) supports several options for customizing automatically
created directory indexes, including HTML file headers, text files or even
involving CGI programs. To learn how to configure automatic directory
indexing, refer to 5.2, “Automatic Directory Indexing” on page 80.
Scope Description
Scalability There is virtually no limit to the amount of compiled code that can
be (dynamically) added.
Development Time The development and test cycle of a program can be drastically
reduced as it gets larger in size. Even small changes can take an
extraordinary amount of time to re-link (and test) a whole
executable if it has to be updated.
Disk Space Dynamic linking saves disk space since common code is not
included in each executable program.
Down Time The server needs not be shut down in order to add new dynamic
modules.
Start Time Start time is reduced because the shared library code does not
need to be loaded at start time, but only when required.
Note: It should be mentioned at this point that Apache also offers the option
of either DSOs or statically linking modules to the base server, which requires
not only a configuration change, but also a compile and link operation.
However, due to the addition of the SSL security code, which is subject to
U.S. export restrictions, the IBM HTTP Server cannot be recompiled. So, the
only option for modules with the IBM HTTP Server is through the DSO
support.
Using the second method, individual modules, when built as DSOs, can be
easily added by using the LoadModule and AddModule directives in the
server configuration file. Also, a new program called apxs (originated from
“APache eXtenSion”) is available to simplify the creation of DSO files for the
Apache modules. The use of DSOs is increasing and Apache already
supports dynamic modules on the operating systems listed in Table 4.
Table 4. Tested Platform Supporting DSO
Operating System Version
FreeBSD 2.1.5, 2.2.5, 2.2.6
OpenBSD 2.x
NetBSD 1.3.1
Linux Debian/1.3.1, RedHat/4.2
Solaris 2.4, 2.5.1, 2.6
SunOS 4.1.3
Digital UNIX 4.0
IRIX 6.2
HP/UX 10.20
UnixWare 2.01, 2.1.2
SCO 5.0.4
AIX 3.2, 4.1.5, 4.2, 4.3
ReliantUNIX/SINIX 5.43
SVR4 -
The concept of DSOs is one of the key feature of the Apache server. With this
dynamic capability, as briefly mentioned in Table 3 on page 15, modules can
be loaded into the server process space at run-time, thus resulting in a
significant overall reduction in memory usage. In addition, the development
process for extending the functionality of the server is greatly enhanced
without the need for re-compilation. For further information about
implementing modules, see Chapter 8, “Building HTTP Server Modules” on
page 177, or read the documentation at http://www.apache.org/docs/dso.html.
Configuration
and
Initialization
Config Phase
Request Phase
The block diagram shown in Figure 1 shows a simplified model of the Apache
server. The parent at the top depicts the Apache main server process setting
up its own environment before multiple child processes are spawned based
on the specific configuration parameters. Basically, the entire operation can
be divided into the processes as shown, where the parent reads the
configuration files and performs module initialization. Thereafter, the server
deals with initialization of the child processes to handle requests from the
clients. As mentioned earlier, the HTTP processes can be viewed as the core
executable programs in the DSO concept discussion, while the modules are
the dynamic modules loaded at run-time. Most modules support their specific
set of directives. Directives are used widely in the configuration file(s) to
denote a certain behavior or instructions to be performed or enforced on
resources, such as files or directories.
The directives within the server’s configuration file(s) are the webmaster’s
controls for the modules. Some modules require little configuration, while
others support a fairly larger number of directives that control their operation.
mod_access
This module deals with basic security checks based purely on hostname or IP
address at the early stages of request parsing.
Directives:
• order – 'allow,deny', 'deny,allow', or 'mutual-failure'
• allow – 'from' followed by hostnames or IP-address wildcards, or ’env=’
• deny – 'from' followed by hostnames or IP-address wildcards or ’env=’
mod_auth
This module deals with user security checks based text files after the Check
Access stage.
Directives:
• AuthUserFile – Text file containing usernames and passwords
• AuthGroupFile – Text file containing group names and member usernames
• AuthAuthoritative – Set to 'off' to allow access control to be passed along
to other modules if the user ID is not known to this module
mod_auth_anon
This module offers the use of anonymous log in using the username as
“anonymous” and password as the e-mail address of the requesting user,
similar to the FTP-implementation.
Directives:
• Anonymous – A list of user IDs separated by spaces
• Anonymous_MustGiveEmail – Controls the need for an e-mail address
• Anonymous_NoUserId – If set ’on’, no user ID is required
• Anonymous_VerifyEmail – If set ’on’, e-mail address is verified
• Anonymous_LogEmail – If set ’on’, e-mail address is logged
• Anonymous_Authoritative – If set 'on', user ID must fulfill one of the users
specified in the list under the Anonymous directive
Directives:
• AuthDBUserFile – DB file containing usernames and passwords
• AuthDBGroupFile – DB file containing group names and member
usernames
• AuthDBAuthoritative – Set to 'off' to allow access control to be passed
along to other modules if the user ID is not known to this module
mod_auth_dbm
This module offers user security checking using DBM files which are more
efficient and convenient to manage when the number of users is large. A
DBM file contains a key (normally the username) used for fast retrieval
through indexing, and a value (the encrypted password).
Directives:
• AuthDBMUserFile – DBM file containing usernames and passwords
• AuthDBMGroupFile – DBM file containing group names and member user
names
• AuthDBMAuthoritative – Set to 'off' to allow access control to be passed
along to other modules if the user ID is not known to this module
mod_digest
This module offers user security checking using MD5 Digest Authentication.
Digest Authentication involves the client browser encrypting the user’s
password before sending it to the server for decrypting. With this module, the
server is capable of handling this kind of security measures provided the
client’s browser has the capability to do the encryption.
Directives:
• AuthDigestFile – digest file containing user IDs and passwords
mod_alias
This module translates Web addresses to filesystem locations in the
document tree.
mod_dir
This module tells the server the name of the file to return as the index of the
directory being accessed. It comes into play when someone connects to a
Web site with a URL that ends in a slash or a directory name, not a file name,
as for example in http://www.CompanyA.com/.
Directives:
• DirectoryIndex – Identifies the index file that Apache should look for before
creating a dynamic directory index
mod_mime
This module informs the server of the type of files based on the file
extensions.
Directives:
• AddType – A mime type followed by one or more file extensions for
determination of content type
• AddEncoding – An encoding (such as gzip), followed by one or more file
extensions for determination of encoding type
• AddLanguage – A language (such as fr), followed by one or more file
extensions for determination of language type
• AddHandler – A handler name followed by one or more file extensions for
assignment of handler to react to these files
• ForceType – Forced a media type
mod_mime_magic
This module determines the MIME type of a file based on magic numbers and
some bytes in the contents of the file.
Directives:
• MimeMagicFile – Name of the MIME Magic file
mod_negotiation
Based on the client’s capability to accept the requested document, it selects
the most appropriate one and returns it to the client. There are two ways of
handling such situations, namely by using file extensions that map to the
standard language tag (application/x-type-map) or by using a variants file that
categorizes all the documents along with their representation types
(type-map), according to the common resources they represent.
Directives:
• CacheNegotiatedDocs – No arguments (either present or absent), but
provides caching of documents on proxy servers
• LanguagePriority – A list of MIME language abbreviations separated by
space for the language selection priority when no preference is stated
from client’s browser
mod_rewrite
This module is not compiled by default, but it is a powerful tool to rewrite
URLs on the fly. It supports unlimited sets of rules that operate many
variables like the server variable, environment variable, HTTP headers,
timestamps, and so forth, as well as their respective conditions. It operates
on full URLs in both the server context and the directory context.
Directives:
• RewriteEngine – On or Off to enable or disable (default) the whole
rewriting engine
• RewriteOptions – List of option strings to set
• RewriteBase –The base URL of the per-directory context for rewrites
• RewriteCond – An input string and a to-be-applied regular expression
pattern for definition of a rule condition
mod_userdir
This module deals with translation of URLs to a user’s home directory.
Directives:
• UserDir – The public subdirectory in users' home directories, or 'disabled',
or 'disabled username username...', or 'enabled username username...' to
govern the resources
mod_env
This module passes environment variables to scripts such as CGI or SSI. The
variables can be set or unset unconditionally or exported from the server’s
environment to the document’s environment for use.
Directives:
• PassEnv – A list of environment variables to pass to CGI
• SetEnv – An environment variable name and a value to pass to CGI
• UnsetEnv – A list of variables to remove from the CGI environment
mod_info
This module provides a great deal of information about the server settings
and its environment. This module is not found in the IBM HTTP Server, but the
procedure to include it is found in 8.2, “The Apache Information Module
(mod_info)” on page 185.
Directives:
mod_log_agent
This module logs the UserAgent header of the client. This enables the server
to know what software (mainly the browser) the client is using to send the
request.
Directives:
• AgentLog – The filename of the agent log containing the UserAgent
header
mod_log_config
This module is much more flexible than the module above. Basically it
enables webmasters to log anything, anywhere. This implies any number of
log files can be used to keep track of any information pertaining to a specific
virtual host or the entire server.
Directives:
• CustomLog – A file name and a custom log format string or format name
where log records are written
• TransferLog – The filename of the access log based on definitions under
the LogFormat directive
• LogFormat – A log format string and an optional format name that
customizes the format of the default log file
• CookieLog – The filename of the cookie log for the logging of cookies
mod_log_referer
This module keeps track of the external URLs that are linked to the pages in
our server. This means that we are able to know where the client “jumped”
from and to which page in our server it is referred to.
Directives:
• RefererLog – The filename of the referer log containing the source referer
header
• RefererIgnore – Referer hostnames to be ignored in the referer log file
mod_status
This module provides statistics on the “health” of the server. Unlike mod_info
that provides information about the server configuration, this module shows
the current activities of the server.
mod_unique_id
This module allocates a unique magic number per request, which is assigned
to the environment variable UNIQUE_ID. The use of this magic number is
similar to the use of the process ID in the UNIX operating system for server
management purposes.
Directives: none
mod_usertrack
This module tracks the client’s movement or traversal in the document tree of
the Web server by using cookies.
Directives:
• CookieExpires – An expiry date code for the cookie to expire
• CookieTracking – Determines whether to enable cookies
mod_actions
This module informs the server which CGI script to execute based on the
MIME type specified in the request.
Directives:
• Action – A media type followed by a script name to be used for execution
• Script – A method followed by a script name to be used for execution
mod_autoindex
This module provides a directory listings for the users. This directory listing
which contains information such as the directory and file sizes, is generated
either by an index file created or by the server itself.
Directives:
• AddIcon – An icon URL followed by one or more filenames
• AddIconByType – An icon URL followed by one or more MIME types
mod_cgi
This module deals with the creation of the environment variables that contain
information about the server and the clients to be passed on to the CGI
scripts for their necessary execution. Files with MIME type
application/x-httpd-cgi or handler cgi-script are executed as CGI scripts by
the server, which returns the corresponding results to the clients. There are
also other types of scripts such as perl, PHP and fastCGI that are supported
by the Apache server. Though the modules supporting them such as
mod_perl, mod_php and mod_fastcgi are not standard modules, the
implementation procedures are covered in Chapter 8, “Building HTTP Server
Modules” on page 177.
Directives:
• ScriptLog – The name of a log for script debugging info
• ScriptLogLength – The maximum length (in bytes) of the script debug log
• ScriptLogBuffer – The maximum size (in bytes) to record of a POST
request
mod_imap
This module deals with image mapping of graphical image maps to Web page
locations. Besides supporting the traditional use of a CGI-program to do the
coordinates to document mapping, the Apache server offers this module to
Directives:
• ImapMenu – The type of menu generated: none, formatted,
semiformatted, unformatted
• ImapDefault – The action taken if no match: error, nocontent, referer,
menu, URL
• ImapBase – The base for all URLs: map, referer, URL (or start of)
mod_include
This module offers documents to be included within documents and handles
the parsing of server-parse documents with the server-parse handlers.
Directives:
• XBitHack – Set Off, On, or Full to control the parsing of HTML documents
mod_speling
This module is not compiled by default, but it is a useful tool for the server to
perform spell check and corrections on the URL requested. Basically, it
compares all the document names in the requested directory against the
name of the requested document, fore-going case sensitivity and even
allowing up to one misspelling in the word, before returning the “giving-up”
error message back to the client.
Directives:
• CheckSpelling – Determines whether to fix miscapitalized/misspelled
requests
mod_setenvif
This module is used to change the environment variables such that other
parts of the server can decide how to react with what actions.
mod_asis
This module allows file types to be sent without using the HTTP headers, but,
rather, using their definition. This implies that all kinds of data can be sent
from the server without using CGI scripts.
Directives: none
mod_cern_meta
This module is incorporated to support the CERN Web server metafiles which
are HTTP headers other than the default HTTP headers output to the files.
Directives:
• MetaFiles – Limited to on or off for Meta file processing
• MetaDir – The name of the directory containing meta files
• MetaSuffix – The filename suffix for meta files
mod_expires
This module is used to set the expiration time for the web document the client
requested using the Expires HTTP header, so that the client can assume the
validity of the document fetched from their local cache before the time
expires. This header can be set in two ways, either by the last-modified-time
or by the time of client’s access.
Directives:
• ExpiresActive – Limited to on or off for generation of Expires header
• ExpiresBytype – A MIME type followed by an expiry date code
mod_headers
This module provides modification of the HTTP response headers before
returning to the client. With this function, headers can basically be added,
removed or replaced.
Directives:
• Header – An action, header and value for modifying, adding or removal
mod_so
This module is used to load modules into the server at runtime.
Directives:
• LoadModule – A module name and the name of a shared object file to load
it from
• LoadFile – Shared object file or library to be loaded into the server at
runtime
EJB Interface
Servlet Engine
WebSphere API
HTTP Engine
Basically, there are three main engines that deal with the entire operation.
They are the HTTP Engine, the Servlet Engine and the Enterprise Java Bean
Engine. The HTTP Engine sits on top of the operating system (with Java
capability) and is the layer that deals directly with the client’s requests from the
browsers. Most of the requests, such as HTML documents, CGI scripts, GIF
images, and so forth, are under the category Static Requests and are dealt with
by the HTTP engine. The WebSphere Application Server provides an Apache
module which lets the Apache server exploit the services provided by the
WebSphere Server. Since WebSphere can be used in a Netscape Web server
via NSAPI, or with Internet Information Server via ISAPI, the migration path from
other servers to Apache can be very smooth. Requests pertaining to Java Server
Pages (JSP) and servlets are also handled by the Servlet Engine. Last, but not
least, stands the Enterprise Java Beans (EJB) engine which is dedicated to
IBM is shipping the IBM HTTP Server with WebSphere Application Server
V2.0. WebSphere will also work with the original Apache Web server (among
other Web servers), but requires a different module than the IBM HTTP
Server.
This chapter covers the installation of the IBM HTTP Server on an RS/6000
machine and the initial setup necessary to start it for the first time. First, the
contents of the IBM HTTP Server product file packages are listed, followed by
the hardware and software prerequisites that are required in order to run this
server. The installation, as explained later in this chapter, is an easy step and
it uses standard methods provided by the AIX operating system. The chapter
then describes some minimal setup that might be necessary to run the server,
and some hints are provided in the case the server does not start
successfully after installation.
IBM employees can also download the IBM HTTP Server from the internal
URL http://w3.software.ibm.com/webservers/html/downloads.html.
The IBM HTTP Server comes in several file packages which contain the IBM
HTTP Server and SSL filesets as follows:
• Base package, without SSL security:
• http_server.base – Contains the IBM HTTP Server base and source
filesets
• SSL module and SSL library packages (required for SSL):
• http_server.modules – Contains the IBM HTTP Server SSL module
fileset
• gskrf301 – Contains the gskrf301.base fileset, which are the base SSL
libraries for use in France and as a prerequisite for the other SSL
filesets
• gskre301 – Contains the gskre301.base fileset, which contains
additional SSL libraries for export outside U.S. and Canada (excluding
France)
• gskru301 – Contains the gskru301.base fileset, which contains the
additional, export-controlled SSL libraries for use in the Unites States
and Canada
Note on Packaging
At the time of writing, packaging of the IBM HTTP Server was still subject
to change. You should check WebSphere installation media and/or the IBM
Web server Web site for latest information about packaging and availability
at http://www.software.ibm.com/webservers.
http_server.base mod_access.so
mod_actions.so
mod_alias.so
mod_asis.so
mod_auth.so
mod_auth_anon.so
mod_auth_dbm.so
mod_autoindex.so
mod_cern_meta.so
mod_cgi.so
mod_digest.so
mod_dir.so
mod_env.so
mod_expires.so
mod_headers.so
mod_imap.so
mod_include.so
mod_log_agent.so
mod_log_config.so
mod_log_referer.so
mod_mime.so
mod_mime_magic.so
mod_negotiation.so
mod_rewrite.so
mod_setenvif.so
mod_speling.so
mod_status.so
mod_unique_id.so
mod_userdir.so
mod_usertrack.so
http_server.module mode_ibm_ssl.so
The standard Apache modules that are not listed in Table 6 (and thus not
included as loadable modules with the IBM HTTP Server) are:
• mod_auth_db – This module performs basic authentication using
Berkley-type DB authentication files. The IBM HTTP Server supports DBM
files through the mod_auth_dbm module.
The file name extension used for compiled modules is .so, which stands for
shared object. More information on using and customizing some of these
modules can be found in Chapter 5, “Advanced Configuration” on page 71.
http://www.software.ibm.com/webservers
Note: This description only applies to the installation of the IBM HTTP
Server. If you install it together with the IBM WebSphere Application Server
V2.0, please also check the prerequisites for that product.
The requirements listed above are minimal. In fact, the IBM HTTP Server will
run in a very limited environment. However, if you plan to run a
high-performance, large-scale Web server, you will certainly have to have
more CPU, memory and disk resources available. Additional information on
performance and scalability can be found in Chapter 7, “Performance and
Scalability” on page 153.
All files of the IBM HTTP Server are installed underneath the
/usr/lpp/HTTPServer directory. This includes the executable binaries, log
files, online documentation, and others. There is a Readme.httpserver file in
/usr/lpp/HTTPServer that you should consult after installation of the IBM
HTTP Server or any updates to it.
The installation is fairly easy and straight forward, but if you are unfamiliar
with the standard product installation process or with the SMIT tool on AIX,
you might want to refer to the AIX Installation Guide, SC23-4112, before you
start with the installation.
COMMAND STATUS
[TOP]
installp -acgNQqwX -d /inst.images/apache -f File 2>&1
File:
gskrf301.base 3.0.1.30
gskre301.base 3.0.1.30
http_server.base.core 1.3.3.0
http_server.base.source 1.3.3.0
http_server.modules.ssl 1.3.3.0
+-----------------------------------------------------------------------------+
Pre-installation Verification...
+-----------------------------------------------------------------------------+
Verifying selections...done
Verifying requisites...done
[MORE...66]
This completes the basic installation of the IBM HTTP Server. You can find
the default directory structure and files as explained in 3.4, “Default File and
Directory Structure” on page 37 in place.
# lslpp -l gskr\*
Fileset Level State Description
--------------------------------------------------------------------------
Path: /usr/lib/objrepos
gskrf301.base 3.0.1.30 COMMITTED gskrf301 for AIX
gskre301.base 3.0.1.30 COMMITTED gskre301 for AIX
Note: The version numbers and fileset descriptions shown above were
correct at the time this book was written. They may be different at a later
date.
First of all, for security reasons, a Web server should never be run with root
authority; it should only be run as a user application under a user that has
limited privileges in the operating system. To make things a bit more
confusing, however, a Web server must start with root authority in order to
have sufficient privileges to open port 80 (the default HTTP protocol port).
The solution to this is that the IBM HTTP Server runs a main process under
root, which then in turn spawns child processes that change their user
identity to whatever user and group are configured in the server’s
configuration file. (There are actually more reasons for spawning multiple
processes; this is only one of them.) By default, these child processes run
under the user and group nobody. Since nobody is an anonymous user, not
only used for a Web server, you may consider creating a separate user and
group for the IBM HTTP Server to run under. This way, you can keep better
As a further security precaution, this user does not need to have a login
password and login should be disabled, thus preventing the user account
from being misused, intentionally or unintentionally.
After you have decided on and created a separate user and group, you must
edit the server configuration file to reflect these changes. The default
configuration file that you need to edit for the IBM HTTP Server is
/usr/lpp/HTTPServer/etc/httpd.conf. Locate the directives User and Group
and change their default settings from nobody to the user and group name
that you have created.
The IBM HTTP Server runs, as mentioned in the previous section, as a main
process and one or more child processes. The main process runs with root
authority, while the child processes run under a different user authority. These
child processes, not the main process, do the actual Web serving work. This
architecture allows for better performance and parallelization of client
requests. The number of child processes is automatically adapted to the
server’s load; the minimum and maximum number can be configured in the
configuration file (along with some other numbers). By default, there is a
minimum of five child processes and a maximum of 150.
Once the server is running, these processes can be seen by running the ps
command:
# ps -ef | grep httpd
www 27024 27454 0 13:50:54 - 0:00 /usr/lpp/HTTPServer/sbin/httpd
www 27160 27454 0 13:50:56 - 0:00 /usr/lpp/HTTPServer/sbin/httpd
root 27454 1 0 13:50:52 - 0:00 /usr/lpp/HTTPServer/sbin/httpd
www 27658 27454 0 13:50:53 - 0:00 /usr/lpp/HTTPServer/sbin/httpd
www 28132 27454 0 13:50:53 - 0:00 /usr/lpp/HTTPServer/sbin/httpd
The main httpd process supervises the child processes. If, for any reason, a
child process terminates, the main process evaluates to start a new one,
depending on certain criteria, like server load and configuration parameters.
Each child process has a counter for the client requests it has served. After
reaching a certain maximum (the default is 10000000), it terminates to allow
a new process to be started. This is a designed, yet elegant way to
circumvent any problems that usually exist in long-living processes, such as
memory leaks. At the time of writing, there was some work in progress to on a
module that dynamically adapts the maximum number of requests per child
process as opposed to the currently fixed number.
3.8 Running the IBM HTTP Server for the First Time
After you have completed the minimal server customization as described in
3.6, “Initial Setup” on page 42, the server is ready to be started. Although this
is certainly not the final configuration in which the server will eventually run, it
is a good idea at this time to check whether the basic server installation and
the other involved operating system components work well.
You will most likely receive the positive output as shown in the last step
above, indicating that the httpd process (actually the IBM HTTP Server) was
successfully started. You may also check this with the ps command as shown
in the previous Section 3.7, “Server Process Structure” on page 43.
If there was a problem, the chance is that there is some additional error
message returned along with the message “httpd could not be started”.
Another, even better place to check for errors is the error log file at
/usr/lpp/HTTPServer/var/log/error_log. The most common errors that cause
starting the IBM HTTP Server to fail are either a lack of permissions (not
being root) or some sort of resource problem, such as a full filesystem.
Note that the default configuration of the IBM HTTP Server is such that this
welcome screen is shown when no other specific page is requested in the
URL (in the example shown in Figure 4:
http://my_test.itso.austin.ibm.com/).
lynx is a text-based Web browser that is not available with AIX or the IBM
HTTP Server. It can be downloaded, however, from various Web sites,
such as http://www-frec.bull.com/.
You should note that the apachectl is a shell script that assumes the default
directory and file structure. If you change these defaults, for example, to
adapt to the directory structure explained in 4.1, “Recommended Directory
Structure” on page 49, you might have to adapt this shell script in order for it
to work properly.
(or smitty remove for the text-based version of SMIT) on the command line to
get to the SMIT Remove Installed Software panel. Use the PF4 key to display
a list of installed software, from which you should select (PF7) the filesets
(http_server.* and, if applicable, gsk*) to remove.
Bear in mind that this only removes the standard files that were installed with
the IBM HTTP Server. If you changed the directory structure or added links to
the executables (for example as explained in 4.1, “Recommended Directory
A Word of Caution
As always, when removing installed software from a system, exercise
extreme care to select the correct filesets that you want to remove.
Selecting the wrong filesets permanently removes those components from
your system.
After you have successfully installed the IBM HTTP Server on your system in
a way that it starts with a minimum setup, there are some additional basic
configuration steps that can or need to be done to further meet your needs.
Configuration of the IBM HTTP Server is done by means of a configuration
file, introduced in the previous chapter, that the server reads when it is being
started or restarted. This chapter lists and explains the most common
directives used in the configuration file that will be used in most IBM HTTP
Server installations. But, before the configuration file is examined, a
recommended directory structure is introduced in the first section that you
might find more suitable than the default as explained in 3.4, “Default File and
Directory Structure” on page 37. Furthermore, a more comfortable way of
automatically starting and stopping is described in this chapter.
The configuration file shipped with the IBM HTTP Server contains directives
that specify the location of a number of files and directories. (Note that in an
actual installation there can be more than just one configuration file.)
Because of this, whenever such a file or directory is moved to another
location, the respective directive in the configuration file(s) must be changed
as well and the server must be restarted in order to recognize those changes.
Table 8 on page 50 lists and explains the directives in the server configuration
file(s) that relate to other files and directories. Also listed are their default and
recommended settings. As you can see, the main purpose of this
Notes:
• General – The /var filesystem was chosen for the above
recommendations because it is the commonly used place for log files. Be
careful, however, since log files can grow and may even use all available
space in /var, preventing other services from functioning properly (or
vice-versa). You might instead choose to create a separate filesystem for
the log files to completely uncouple them from other services running on
the same machine.
• Shared File Systems – It is recommended to store the log, lock, and error
files on a local filesystem rather than on a remote filesystem.
• PidFile – The apachectl utility (see 3.8.1, “The apachectl Utility” on page
45) is a UNIX shell script that uses the PidFile. You must also adapt (edit)
Basic Configuration 51
The following two sections describe methods for implementing such
automatic startup and shutdown processes.
Follow these steps to make the HTTP Server start up automatically when the
system boots up:
1. Login as root user to the system where the HTTP Server will be started
automatically.
2. Create a simple startup shell script /etc/rc.httpd for starting the IBM HTTP
Server, such as:
#!/usr/bin/ksh
# Configures the Automatic Startup of the IBM HTTP Server
# This file should be owned by root:system and have permissions 0774
BINPATH=/usr/lpp/HTTPServer/sbin
The last step adds the /etc/rc.httpd command as the very last entry to the
/etc/inittab file. In other words, the HTTP Server will only be started when all
other services have already been started. Note that the mkitab command has
options that allow you to add the /etc/rc.httpd command at any specific
place in /etc/inittab, should you choose to have the HTTP Server started
before certain other services are started.
If the Web server does not start successfully, the logfiles are a good place to
start looking for the cause. One common problem with SSL that hinders the
automatic startup of the Web server is when the password for the key
database is not stored (stashed) in a file. In this case, the server startup
hangs when it tries to ask for the key database password (see 6.4, “Secure
Sockets Layer, SSL” on page 129 for more information).
Like other UNIX operating systems, AIX provides an option to run a shell
script when the system is being shut down. When AIX is being shut down, it
checks for the existence of the file /etc/rc.shutdown (note that the correct file
name is relevant, and it must be executable). If such a file is found, it will be
executed early in the shutdown process. If your system does not already have
an /etc/rc.shutdown file, you should create it. Otherwise, append the relevant
statements to it. Below is an example of an /etc/rc.shutdown file that shuts the
IBM HTTP Server down.
#!/usr/bin/ksh
BINPATH=/usr/lpp/HTTPServer/sbin
Basic Configuration 53
echo "Shutting down the IBM HTTP Server..."
$BINPATH/apachectl stop
exit 0
Note
If /etc/rc.shutdown does not terminate successfully, that is if it returns a
non-zero return code, the system shutdown process stops. For this reason,
the /etc/rc.shutdown script should always exit with return code zero (exit 0),
unless there is a strong reason for not doing so.
Note that the apachectl command, called in the example above, is written in a
way that it will return even if the IBM HTTP Server is not running or, even
worse, not responding to any commands. In the shutdown script you should
not run commands that might hang, or, for example, wait for some input from
a user. More information on the apachectl command can be found in 3.8.1,
“The apachectl Utility” on page 45.
More information about the apachectl command can be found in 3.8.1, “The
apachectl Utility” on page 45.
Standalone or inetd?
The IBM HTTP Server can be run in two modes: standalone and inetd. In
the inetd mode, the server only gets started when an HTTP request is
received. The inetd mode is not recommended and, therefore, not further
explained in this book.
The configuration file (http.conf) contains the related directives such as how
the server runs, the user and group ID definition that the server used to run
as, where the log files are to be written to, the port it listens to, the location of
other files, and so on.
Port specifies the port number that server listens on. The default port number
is 80 for Web servers. Note: If the port number is 1023 or below, the IBM
HTTP Server must be started as root. If any other port than 80 is used, it
must be specified in the URL for that server. For example, if port 8080 is to be
used instead, the corresponding lines in httpd.conf are:
Port 8080
Listen 8080
User defines the user ID (or UID) under which the server will run. Although
the server will be started as root in most cases, the actual HTTP request
servers run under a different user ID for security and other reasons. The user
nobody is the default value, but it is recommended to create a special user for
the server (see 3.6, “Initial Setup” on page 42). The corresponding directive
in httpd.conf for a user ID www would look like:
User www
Basic Configuration 55
If the server is started by someone other than root, this parameter is ignored.
Group is similar to the User directive explained above. It specifies the group
the HTTP server processes should run under. An example is:
Group www
ServerRoot specifies the absolute directory that serves as a root directory for
other files specified by their respective directives that do not contain an
absolute filename, such as the error and access log files. However, it is
recommended to use fully qualified filenames in all other directives that
involve files because relative filenames may easily be confusing. The default
is:
ServerRoot /usr/lpp/HTTPServer
ErrorLog specifies the filename for the error log file. The default is:
ErrorLog /usr/lpp/HTTPServer/var/log/error_log
ServerName is how the IBM HTTP Server identifies itself in error messages
that are sent to client after an error occurred. It should specify the server’s
hostname that can be resolved to an IP address. Example:
ServerName www.CompanyA.com
The ClearModuleList directive that follows clears the list of modules that is
already compiled into the server by default (as with most configuration
options, the server comes with a compiled-in default module list that applies if
no other definitions are done in the configuration file).
The list of AddModule directives that follows enables the modules. Bear in
mind that the order might be important for a certain function to work properly.
You should consult the documentation of the respective module for more
information on the order.
For historical reasons (as descendant of the NCSA httpd server), Apache
(and thus the IBM HTTP Server) supports three configuration files: httpd.conf,
access.conf and srm.conf. The NCSA httpd server used each of these files
for different configuration directives. In the current release of Apache (IBM
Basic Configuration 57
HTTP Server) all configuration is merged into one file, the httpd.conf
configuration file. Other files, though still supported, should only be used
when specific reasons require this.
Every directive line starts with the directive’s name, followed by optional or
mandatory parameters for that directive. Each directive must end on the
same line; it is not allowed to continue directive across multiple lines. A
parameter must be separated from the directive and other parameters by at
least a space or tabulation symbol.
Directives are not case sensitive. You can mix uppercase and lowercase
letters in them. Some parameters, however, are case sensitive, particularly
file and directory names. Case sensitivity of URLs can be removed by using
the spell checking feature of the IBM HTTP Server as explained in 5.9.1,
“Fixing Typos in URLs” on page 106.
A .htaccess file placed in a particular directory applies to that directory and all
its subdirectories. It is equivalent to a <Directory> section (see 4.8, “Sections”
on page 61) in the httpd.conf file.
For example, assuming the httpd.conf file contains the following section:
Basic Configuration 59
<Directory /www/html/public/support>
AllowOverride All
...
</Directory
Whenever a document from within that directory is requested, the Web server
looks for the following files:
/.htaccess
/www/.htaccess
/www/html/.htaccess
/www/html/public/.htaccess
/www/html/public/support/.htaccess
If any of these files exist, the Web server reads and applies their contents to
the currently effective configuration, which affects server performance. Thus,
for performance reasons, it is recomended to use .htaccess files only when
really required. More about configuration processing can be found in 4.8.4,
“Sections Processing Rules” on page 64.
The server’s default for AllowOverride is All for all directories if it is not
specified at all. The IBM HTTP Server therefore contains the following
directive in its default configuration file to prevent the server from searching
for such files:
<Directory />
AllowOverride None
...
</Directory>
The .htaccess files can contain confidential information (for example the path
and filenames of authentication files). It is possible to restrict access to these
files from Web clients using the following section in the main configuration file
or .htaccess files themselves:
<Files .htaccess>
Order allow,deny
Deny from all
</Files>
4.8 Sections
The IBM HTTP Server has very flexible features to define configuration
parameters for individual URLs, directories and even single files. This can be
done by placing configuration directives into special sections (sometimes also
called containers or scopes) of the configuration file.
Basic Configuration 61
Sections can contain most of the supported configuration directives, including
some other sections. If in doubt, consult the online documentation for a
particular directive to find out whether or not it can be included in a section.
This section explains the three basic section types: <Directory>, <Files>, and
<Location>. A configuration file can also contain other sections, such as
<DirectoryMatch>, <LocationMatch> and <FilesMatch>, which are used for
regular expression matching. The sections <IfModule> and <IfDefine> are
used for conditional processing of directives. More about the <VirtualHost>
section can be found in 5.1, “Virtual Hosts” on page 71 and about the <Limit>
section in 5.6, “File Uploading” on page 95.
4.8.1 <Directory>
The <Directory> section is the most commonly used sort of section. It
contains configuration directives that apply to a specific directory and its
subdirectories. The directory can be specified by an absolute path, by any
string with wild-card characters (“?” for single character and “*” for any
sequence of characters) or by a regular expression. For security reasons and
for simplicity, it is recomended to always use absolute path names.
For example, the following section found in the default IBM HTTP Server
configuration file defines the configuration for the whole AIX filesystem
(although not all of it will be accessible by clients):
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
It enables symbolic links (see 4.9, “Request Mapping” on page 66), but
disables automatic directory indexes (see 5.2, “Automatic Directory Indexing”
on page 80), server side includes (see 10.4, “Server-Side Includes” on page
219) and MultiViews (see 5.4, “Multiple Language Support” on page 87). The
above example also disables the use of .htaccess files (see 4.7.1, “.htaccess
and Performance” on page 59).
The following example can also be found in the default configuration file
shipped with the IBM HTTP Server:
<Directory /usr/lpp/HTTPServer/share/htdocs>
Options Indexes FollowSymLinks
AllowOverride None
It overwrites the previous configuration for the root directory (see previous
example) for the /usr/lpp/HTTPServer/share/htdocs directory and its
subdirectories.
4.8.2 <Files>
The <Files> section in the server configuration file is very similar to the
<Directory> section. The difference is that the <Files> section settings apply
to files according to file name match. You can also use wild-cards (like “?” for
single character and “*” for any sequence of characters) or regular
expressions in the file names.
For example, the following configuration file fragment forbids access to all
.htaccess files beneath /www/html:
<Directory /www/html>
AllowOverride All
<Files .htaccess>
Order Allow,Deny
Deny from All
</Files>
</Directory>
Other than the <Directory> and <Location> directives, the <Files> section
can also be used in .htaccess files.
4.8.3 <Location>
The <Location> section does not apply to directories and/or files, but to
requested URLs. For example, the following section allows access to URLs
/internal (and below) only from the IP network 1.2.3.*:
<Location /internal>
Order Deny,Allow
Deny from All
Allow from 1.2.3
</Location>
Basic Configuration 63
Although similar, the <Location> section has nothing in common with the
directory structure. The <Location> directive has no effect if the same
directory structure can be accessed through another URL. The following
example shows such case:
DocumentRoot /www/html
<Location /internal>
Order Deny,Allow
Deny from All
Allow from 1.2.3
</Location>
Directives found in the main section of the configuration file httpd.conf have
the lowest priority. They are overwritten by directives found in <Directory>
sections, followed by those in .htaccess files. Directives within <Files>
sections and finally within <Location> sections have the highest priority.
<Directory /www/html/internal>
Order Deny,Allow
Deny from All
Allow from 1.2.3
</Directory>
For example, in the following configuration, the second section overwrites the
first one:
<Directory /www/html/internal>
Order Deny,Allow
Deny from All
Allow from 1.2.3
</Directory>
<Directory /www/html/int*>
Order Deny,Allow
Allow from All
</Directory>
Basic Configuration 65
4.8.5 Recommendations on Sections Usage
Using nested sections and additional configuration directives can create
some confusion on how they are actually processed by the server. For this
reason, we suggest a few simple rules to keep your configuration more clear
(at least for yourself).
• Do not use <Location> sections at all (unless truly necessary). Almost
everything can be done with <Directory> sections.
• Use <Files> sections only when you really need them. In most cases, a
solution can be found by putting all these files into a separate directory
and using a <Directory> section.
• Use .htaccess files only when you need to have distributed administration
and configuration (see 4.7, “Distributed Configuration” on page 58). Avoid
using several .htaccess files in one directory path (for example,
/www/.htaccess and /www/html/.htaccess).
• Keep the number of sections in the configuration file at minimum.
Sometimes rearrangement of the directory structure helps.
The basic directive that maps a URL to the filesystem of the Web sever is the
DocumentRoot directive. For example, the directive
DocumentRoot /www/html
This instructs the Web server to insert the directory /www/html in front of all
requested resources in URLs. To continue with this example, when the Web
server www.CompanyA.com gets a request with the URL
http://www.CompanyA.com/images/products.gif, it looks for the file
/www/html/images/products.gif. This requires that all files are stored
underneath one directory (except, of course, when using symbolic links within
the filesystem).
Another way to use the other directory tree is the Alias directive. This
directive is implemented by the standard module mod_alias. For example, the
following directive maps all requests that begin with /download to the
directory tree /ftp/pub/download:
Alias /download /ftp/pub/download
The IBM HTTP Server also has a possibility to tell a client to look for the
requested resource in another location. This feature is useful when a
document has been moved to a different location or server. The Redirect
directive that is used for such cases is also implemented by the mod_alias
module.
More information about the Alias and Redirect directives can be found in the
online documentation.
4.10 Options
The Options directive is used to enable some advanced features of the IBM
HTTP Server. Many of these features are explained in other sections in this
book in the context or their respective meaning. Here we briefly overview
each option and provide references to more detailed descriptions. Then, the
syntax of this directive and its usage in configuration files is explained.
Basic Configuration 67
Indexes Automatic directory indexing is enabled. See 5.2,
“Automatic Directory Indexing” on page 80 for
more details.
Includes Server side includes (SSI) are enabled. See 10.4,
“Server-Side Includes” on page 219 for more
details.
IncludesNOEXEC SSIs are enabled, but program execution inside
SSIs is disabled. See 10.4, “Server-Side Includes”
on page 219 for more details.
MultiViews The automatic multiple variants feature is enabled.
See 5.4, “Multiple Language Support” on page 87
for more details.
ExecCGI CGI program execution is enabled. See 10.2, “CGI
Programs” on page 214 for more details.
FollowSymLinks AIX file system symbolic link support is enabled;
the server follows symbolic links. See 4.9,
“Request Mapping” on page 66 for more details.
SymLinksIfOwnerMatch Acts similar to the previous parameter, but with
additional restrictions. See online documentation
for more details.
All This option is equivalent to:
directive Options Indexes Includes ExecCGI
FollowSymLinks
Please notice that MultiViews not included.
4.10.1 Syntax
The parameters after the Options directive can have the prefixes + or -.
These prefixes indicate that the result should be accumulated with other
Options directives. That applies to directives within single sections (such as
<Directory>) and directives from other sections or .htaccess files. For more
about directives processing, see Section 4.8.4, “Sections Processing Rules”
on page 64.
<Directory /www/html/demo>
Any option without a + or - prefix resets all previously set options. For
example, the following sequence of directives
Options +Includes
Options Indexes
Options +MultiViews
because the second line (Option Indexes) without any prefix also resets the
previous Option Includes.
Basic Configuration 69
70 IBM HTTP Server Powered by Apache on RS/6000
Chapter 5. Advanced Configuration
This chapter explains the most commonly used, advanced features of the IBM
HTTP Server and provides some examples on how to use and customize
them. It would be beyond the scope of this book to describe all features of the
IBM HTTP Server since some of them are seldom used and the list of
available functions is almost endless, given the number of generally available
modules. For additional information on advanced features, including
seldom-used functions that are not described here, we refer you to the online
IBM HTTP Server documentation, which is shipped with the product, or to the
Apache documentation at http:www.apache.org.
The IBM HTTP Server ships with most of the commonly available modules.
Should you require to add other modules that are not shipped with the
product, you should also read Chapter 8, “Building HTTP Server Modules” on
page 177.
Each IP-based virtual host must have its own IP address (see Figure 5 on
page 73). Usually the computer running the virtual hosts has multiple IP
addresses assigned to a single network interface adapter if the operating
system supports this (or it might optionally have multiple adapters). IP-based
virtual hosts provide a transparent solution to any browser; the function does
not rely on any specific browser functionality and therefore tends to be the
preferred method for many sites that implement virtual hosts.
From the browser’s view, there is no difference between a virtual host and a
real host; both have their own hostname (as included in the URL) and
associated IP address.
The key to handling IP-based virtual hosts is the server’s ability to handle
multiple IP addresses, be it on one single network interface or on multiple
interfaces. The Web server software, on the other hand, needs to be able to
distinguish and handle separate requests accordingly.
Real Host
Names: www.CompanyA.com
Virtual Host www.CompanyB.com Virtual Host
Name: www.CompanyA.com Addresses: 1.2.3.4 Name: www.CompanyB.com
Address: 1.2.3.4 1.2.3.5 Address: 1.2.3.5
Advanced Configuration 73
physical Web server have the same IP address, but are distinguished by the
server name in the HTTP protocol header that the browsers sends to the
server (see Figure 6). Name-based virtual hosts rely on the HTTP Version 1.1
protocol implementation which may not be supported by older browsers.
Real Host
Virtual Host Names: www.CompanyA.com Virtual Host
Name: www.CompanyA.com www.CompanyB.com Name: www.CompanyB.com
Address: 1.2.3.4 Address: 1.2.3.4 Address: 1.2.3.4
The key in name-based virtual hosts is that the clients’ requests, which are
being routed to the same physical interface with the same IP address, carry
the hostnames in their headers such that the Web server software can
distinguish them. This feature was only introduced with HTTP Version 1.1
protocol, which must be supported by both the server and the browser.
The IBM HTTP Server supports both IP-based and name-based virtual hosts
on the same computer. The next sections explain how these can be set up.
5.1.3 Setting It Up
Setup for IP-based and name-based virtual hosts have much in common, so
we do not separate them, but explain the differences in each step.
As we have seen, each IP-based virtual host must have its own IP address.
Multiple IP addresses can be added to a system by either installing multiple
network adapters or by assigning multiple IP addresses to a single network
interface. To assign an additional IP address to an existing and configured
network interface on AIX, use the command ifconfig with the alias option.
The following example shows how to assign an additional IP address (1.2.3.4
with network mask 255.255.255.240) to the Ethernet interface en0:
# ifconfig en0 1.2.3.4 netmask 255.255.255.240 alias
For each name-based virtual host, you must have a DNS record (alias, also
known as canonical name, CNAME) pointing to the same IP address. Such
configuration is usually done by a DNS administrator on a different computer
that serves as a DNS server, and the details depends on the individual DNS
setup.
Either way, for IP-based and name-based virtual hosts, you should check that
the newly introduced hostnames correctly resolve into the assigned IP
addresses. This can easily be done, for example, with the host <hostname>,
ping <hostname>, or nslookup <hostname> commands on AIX.
Advanced Configuration 75
...
</VirtualHost>
You can use most of the IBM HTTP Server directives inside the <VirtualHost>
section, but some of them are highly recommended:
• DocumentRoot – This directive should be here because that is most likely
the reason for using a virtual host to serve a separate document tree
under another server name.
• ServerName – This directive is useful for performance and availability
reasons because, when used, the server does not need to do a DNS
lookup to find its name. It is not needed, however, when the server’s name
is specified in the <VirtualHost> directive (see above).
• TransferLog and ErrorLog – These directives specify separate log files for
such a virtual server. For details, see 5.7, “Logging” on page 97.
• ServerAdmin – Specify different e-mail address for each virtual host. This
e-mail address can be automatically appended to error messages in order
to give users some help in case of errors. See 5.5.1, “Customizing Error
Messages” on page 92 for more information on using e-mail addresses in
error messages.
If you decide to use name-based virtual hosts, you must use the
NameVirtualHost directive (outside of the <VirtualHost> section). This
directive specifies the IP address which will be used for name-based virtual
hosts. In this case, a corresponding section of a configuration file could look
like:
Do not forget to grant appropriate access rights for document directories with
appropriate <Directory> directives. More about directory access rights and
related directives can be found in 6.2, “Basic Authentication” on page 118.
It should be pointed out regarding the example above that it is required that
the server aliases only work as long as the hostname supp resolves into the
IP address 1.2.3.4.
5.1.4 Testing
After changing the server configuration file as shown, it is recommended that
you verify the configuration file syntax with the apachectl configtest or the
httpd -t command. If there are errors, the commands will notify you
immediately. If the commands reports “Syntax OK”, you can restart the server
Advanced Configuration 77
with command apachectl graceful in order to read and apply the new
configuration to the running server.
After successful restart of the server, try to access the new virtual servers
from a browser. Make sure that the Web server provides the correct pages for
each virtual host. If problems arise with name-based virtual hosts, it may be
caused by the browser if it does not fully support the HTTP/1.1 protocol
standard. If you suspect such a problem, try a newer version or another
product. Most current Web browsers support HTTP/1.1, but not all.
5.1.5 Logging
If you have virtual hosts set up, you would possibly like to have separate log
files for each of them. The IBM HTTP Server allows you do this by using the
TransferLog and ErrorLog directives in the <VirtualHost> section. If you do
not specify these directives, the server writes all log records into a single log
file specified in the main section of the configuration file. (However, there are
tools available on the Internet that allow you to split single log file into several
log files, along with other functionality to analyze log files.)
Note on mod_rewrite
The module mod_rewrite is actually very powerful and supports functions
far beyond the single use that will be explained. For more information
about this module, refer to 2.3.2, “Translation Modules” on page 21 or the
original Apache documentation.
When a client request does not include a server name, the IBM HTTP Server
uses the first <VirtualHost> section that matches the IP address. It is
therefore recommended to create a special <VirtualHost> section as the first
that always returns a list of all available virtual hosts with references to the
separate subdirectories of each virtual host (see example shown in Figure 7,
where an old version of Microsoft Internet Explorer was used that did not
support the HTTP/1.1 protocol). It is then the user’s choice to select the
correct page.
<VirtualHost 1.2.3.4>
DocumentRoot /www/html
RewriteEngine On
RewriteRule ^/.* /www/html/index.html
</VirtualHost>
<VirtualHost 1.2.3.4>
DocumentRoot /www/html/CompanyA
ServerName www.CompanyA.com
ServerPath /com_a
...
</VirtualHost>
<VirtualHost 1.2.3.4>
DocumentRoot /www/html/CompanyB
ServerName www.CompanyB.com
ServerPath /com_b
...
Advanced Configuration 79
</VirtualHost>
Important Note
In order to make everything work, use either relative references between
documents belonging to the same virtual host (for example ../img/logo.gif),
or include the full corresponding subdirectories (for example
/com_a/img/logo.gif). Do not use absolute references (like /img/logo.gif),
because in that case the server will return the virtual hosts list again.
Advanced Configuration 81
Here is an excerpt of a possible configuration file that specifies fancy
indexing:
<Directory /www/html/demo>
Options Indexes
IndexOptions FancyIndexing
...
</Directory>
Adding text to an index can be done by using the directives HeaderName and
ReadmeName. They specify names of files that are to be included into the
generated HTML document representing the index. These files can be plain
text files or HTML fragments. For example, the directive ReadmeName
Right after installation, the IBM HTTP Server’s configuration file includes the
following directive in its main section:
IndexIgnore .??* *~ *# HEADER* README* RCS
This default directive excludes the following files from all indexes:
• Files whose names begins with dot and are longer than two characters
(that means that the parent directory is not excluded)
• Files whose names end with either the tilde (~) or the pound (#) character
• Files whose names begin with HEADER and README (usually appended
to directory index)
• File with the name RCS (Revision Control System)
Advanced Configuration 83
Syntax of IndexOptions
The options in the IndexOptions directive can have the prefixes + or -.
These prefixes indicate that the result should be accumulated with previous
IndexOptions directives. That applies to options within single section (like
<Directory>) and are inherited from higher-level sections. For more about
directives processing in sections, see 4.8.4, “Sections Processing Rules”
on page 64.
<Directory /www/html/demo/images>
IndexOptions -SuppressSize +SuppressDescription
</Directory>
Any option without prefix + or - resets all previously set options. For
example, the following sequence of directives:
IndexOptions +SuppressDescription
IndexOptions SuppressSize
IndexOptions +SuppressLastModified
It can be used in the main configuration file, but a much more convenient
place to use it is the local .htaccess file in that directory.
The IBM HTTP Server supports this feature with the module mod_userdir.
This module is by default included in the appropriate LoadModule and
AddModule section of the httpd.conf file, thus no action is required to enable
them. The directive UserDir defines where the Web server must look for user
files. A user directory is specified by using the ~ (tilde) prefix in front of a
user’s name in an URL (for example: http://www.CompanyA.com/~joe/).
The use of the user directory feature has also some security aspects to
consider. For example, if you specify UserDir ./, the whole file system could
Advanced Configuration 85
become accessible through the URL /~root (assuming root’s home directory
is /). That is why the UserDir directive also supports the keywords enabled
and disabled. Using these, access to root’s home directory can explicitly be
disabled using by the following:
UserDir disabled root
Another security hole can be exposed by using erroneous CGI programs and
server side includes in a user’s directory. The IBM HTTP Server allows you to
disable CGI programs and server side includes by using the directive Options
-ExecCGI -Includes in the server configuration file. Another option, the
Options -IncludesNOEXEC, allows server-side includes, but the #exec
command and #include of CGI scripts are disabled.
To read more about CGI programs and server side include, see Chapter 10,
“Web Applications” on page 213. More about security can be found in
Chapter 6, “Deploying Security” on page 111.
The IBM HTTP Server accesses users’ home directories use the AIX
operating system APIs. This makes the location of the directories and files
transparent to the server and it does not care where these directories are
physically located. If the users’ home directories are located on some sort of
a distributed file system (such as NFS, AFS or DCE/DFS), they can be on
separate machine.
Using a distributed file system adds more flexibility, but may rise at the same
time some performance and availability concerns arise. The main advantage
of such a configuration is the ability for users to publish their files without the
necessity for any special file transfer to the Web server machine. HTML file
editing can be done locally on each user’s workstation and, after saving a file
to the specified directory (which appears to be local on the user’s
workstation), it is instantaneously available on the Web. On the other hand, a
Web server cannot access and serve these files if some network problems
arise or if a machine on which the files are physically located will experiences
a problem.
Multiple language support is a part of the broader IBM HTTP Server feature
called Content Negotiation which allows a server and browser to negotiate
document language, encoding and media type (for example, a client can
specify which video format it prefers: MPEG, MOV, or AVI). But this feature is
not widely used because most browsers do not fully implement the content
negotiation feature.
This section covers only language negotiation because it is most useful and
most widely supported. Encoding and media type negotiation works similar
and the following discussion on language negotiation applies to that as well.
Advanced Configuration 87
Type map file lists files and associated content information separated by
empty lines.
URI: garantie.html
Content-type: text/html
Content-language: de
This gives the user the choice of selecting the appropriate document (which
should certainly be avoided by proper configuration of browsers and servers).
5.4.1.2 MultiViews
The MultiViews feature allows you to build variant lists automatically
according to file extensions. The MultiViews feature can be enabled by the
These directives associate file extension ".en" with the English language and
".de" with the German language. They also assign the English language
documents a higher priority when clients do not specify any language priority.
Now, if there are files info.html.en and info.html.de in the document root
directory, the request http://www.CompanyA.com/info.html will return the
appropriate document according to the browser’s configuration. If it is not
possible to find an appropriate document (for example, if the browser accepts
only French documents), the IBM HTTP Server returns a list of available
variants, as shown in Figure 12.
While the screen shown in Figure 16 is certainly not desirable for a user, it
shows you the basic operation of MultiViews in an non-working configuration.
Advanced Configuration 89
5.4.2 Browser Configuration
Users who want to use the language negotiation function of the HTTP
protocol must setup their Web browsers language preferences. Most popular
browsers support this feature. The following examples show how to set up
multiple language support in Netscape Navigator, Netscape Communicator
and Microsoft Internet Explorer.
In Microsoft Internet Explorer (Version 4), users can access the language
preferences from: View -> Internet Options... -> General -> Languages....
An example is shown in Figure 14 on page 91.
Although the negotiation feature was supported in the HTTP protocol prior to
Version 1.1, many browsers did not support it in earlier versions. The HTTP
protocol also provides the option of assigning weights to each variant to
further tune the negotiation process between server and browser (which is
not detailed here any further).
Advanced Configuration 91
Figure 15. Standard Error Message
There are a number of situations when the IBM HTTP Server returns an error
message. Each of these errors has an assigned number, a so-called error
code. A full list of error codes can be found in the HTTP protocol specification
(RFC 1945 for Version 1.0 and RFC 2068 for Version 1.1; see B.3, “Other
Publications and Links” on page 228 for information on how to access RFCs).
The first option is the default. It takes place if no other option is specified. You
can, however, specify that the e-mail address specified with the ServerAdmin
directive be included automatically as a hotlink in the default error message.
For example, we assume that the following two lines are included in the
server’s configuration file:
...
ServerAdmin [email protected]
...
ServerSignature email
...
This causes the IBM HTTP Server to include a hotlink in its default error
messages that allows a user to click on and send an e-mail to the specified
address at [email protected] (see Figure 16, notice the
underlined hotlink in the footer of the error page).
The following example shows how to define a custom error message using a
ErrorDocument directive in the server configuration file:
ErrorDocument 404 "The requested resource cannot be found on this server.
Please return to our <A HREF="http://www.CompanyA.com">home page</A>.
(Note that the above example is a single line that has been split for
representation.)
Advanced Configuration 93
Syntax of the ErrorDocument Directive
The ErrorDocument directive has a syntax that sometimes misleads
webmasters. A single quotation mark (") after the error code is required,
which indicates the beginning of the message and it does not require a
closing quotation mark. In fact, any other quotation mark is treated as part
of the message that would be shown along with the error message.
Figure 17 shows the result of the custom error message defined above.
As can be seen in the example above, HTML tags can be used in customized
error message. However, some browsers may not interpret them correctly
because that message does not have a standard HTML header.
The third option can be configured by placing a local or remote URL after the
respective error code behind the ErrorDocument directive. That URL can
point to either a static HTML file or to a script that processes errors. Local
URLs begin with a slash (/) and remote URLs include protocol and host
name. Here are some examples of error redirections:
ErrorDocument 401 http://www.CompanyA.com/security.html
ErrorDocument 403 /forbidden_pages.html
ErrorDocument 404 /cgi-bin/search.pl
You can use different error handling directives for each virtual host, directory
or location. You can also use them in the .htaccess files for individual
directories.
Server side includes allow you to have a common page layout for all
languages and the content negotiation feature selects the corresponding
language.
A good place for further information on this tricky setup can be found at
http://www.apache.org/docs/misc/custom_errordocs.html.
While the HTTP protocol includes ways to upload files from a Web browser to
a Web server using either the PUT or POST mechanism, there is no standard
way of handling such requests on the server side. Thus, the IBM HTTP
Server does not include any handler that can process such uploads. You can,
however, find them on the Internet or write a small program yourself. PUT and
POST requests can be processed by either CGI programs or Web server
modules. The configuration is different in each case, but their common
purpose is to receive a file from the client, while at the same time tightening
security as much as possible.
The following example shows how to configure the IBM HTTP Server to use a
Perl script called /www/put-cgi/put.pl for file uploading to the directory tree
/www/html using basic authentication and restriction by IP address:
<Directory /www/html>
Advanced Configuration 95
Script PUT /put-cgi/put.pl
</Directory>
<Directory /www/put-cgi>
AuthName "Web Publishing"
AuthType Basic
AuthUserFile /www/passwords/put.pwd
Order allow,deny
Deny from All
Allow from 1.2.3.6
Require valid-user
Satisfy All
</Directory>
The PUT method can also be handled by server modules, rather than a CGI
program. In that case, the configuration is dependent on that module’s
capabilities, but the security considerations and configuration basically
remain the same. You can find such a module by searching the Web site
http://modules.apache.org for keyword “put”.
More about using the PUT method can be found in the Apache Week article
“Publishing Pages with PUT” at http://www.apacheweek.com/features/put.
Another method of file uploading is the POST method. You also need at least
a CGI program or a module to handle this HTTP method. One of the most
widely used examples of such a file uploading method is through Microsoft’s
FrontPage Server Extensions.
5.7 Logging
The IBM HTTP Server provides good logging options to track user behavior,
server usage and find problems that are related to Web pages. The figures of
Web server usage are often needed for marketing purposes. There are lots of
free applications on the Internet to parse the log files to fit into presentation
formats. In this section we will concentrate on showing how you can log every
useful detail in the Web servers to log files and how to read them. Many of the
problems and configuration errors are easily solved if the logging is defined
properly. In the IBM HTTP Server, the log files are ASCII text files that can be
viewed with basic UNIX tools.
The first field shows the IP address of the client, accessing the server. It can
also be an IP address of a firewall or a proxy server the client uses to access
the Internet. In this example, the Web server was not configured to resolve
the hostname and domain of the client, which is advisable for performance
reasons. The more efficient way to resolve the hostnames is to use the
logresolve program that is shipped with the IBM HTTP Server. The logresolve
program is introduced in Section 7.1.3.4, “Resolution and Mapping” on page
162. If the server had looked up the DNS hostnames, the IP address would
have been replaced with a hostname (if defined in DNS). The DNS lookups
can be defined on with HostnameLookups (on | off | double) directive. The
value double should not be used unless it is unavoidable. When double DNS
lookups are defined, the server does an additional lookup for the IP address
of the hostname it has resolved using the client’s IP address. Double DNS
lookups are done regardless of the value of the HostnameLookups directive
when the resources in a protected area are requested. If double DNS
resolution fails, that is, the IP address of the connecting client and the IP
address resulting from a double look up do not match, the access is denied.
Advanced Configuration 97
The second field of a transaction log entry is usually just a hyphen (-). The
hyphen in the log files represents that the information is not available. The
field is for Ident information. Ident is a protocol (RFC 931) based on the
presumption that the Ident daemon is running on the client machine. The
Ident daemon is usually not implemented (or configured) on client machines,
so the benefit are not worth the overhead it causes in the Web server. The
checking of the client’s Ident information can be controlled with the directive
IdentityCheck (on | off). The IdentityCheck directive defaults to off.
The third field displays the user name of the client, if the client user has
authenticated to the Web server using the Web authentication methods.
Remember that this file easily reveals the authorized Web users of the
system. If the file is not protected properly, anyone can produce a list of
possible user IDs with a command such as (<log file> is the server’s access
log file):
# cat <log file> | cut -d’ ’ -f3,9 | grep -v 401 | grep -v ’-’
The date, time and timezone are enclosed into brackets. The date
representation consists of day number, three letter abbreviation for the month
and the four digit number for the year. The language of the abbreviation for
the month depends on the language environment used on the server
machine.The time is presented in 24 hour format: hours, minutes and
seconds, separated by colons. The timezone is presented with a +/- sign and
four digits without a separator (+/-hhmm).
Within the double quotes is the HTTP request that the client sent. Usually it
begins with GET, which is the basic method to request a file from the server.
The file name in the log file is written as the browser sees it. That is, the
location is not yet parsed to match the actual file name.
The following three-digit-number field (the second last), represents the status
code of the operation, the most commonly used status codes are as follows:
Table 9. Commonly Seen Status Codes in HTTP Requests
304 Not Modified - A Web server responds with this status code to a client’s
conditional GET request, where the client asked whether it can use the
cached copy of the requested resource.
400 Bad Request - Client has performed a request with a malformed syntax.
401 Unauthorized - The Web server denies access because it has some
access limitations defined to requested resource (See 6.2, “Basic
Authentication”).
404 Not Found - The server was not able to find the requested resource.
405 Method Not Allowed - The Web server configuration denies the use of
this method.
500 Internal Server Error - The Web server encountered an internal error,
which prevented it from fulfilling the request.
503 Service Unavailable - The Web server can give that status code, if the
defined limit of users is exceeded.
A more complete list of status codes can be found in the RFC document for
HTTP Version 1.1 (http://www.w3.org/Protocols/rfc2068/rfc2068).
The last field in the transfer log entry is the count of transferred bytes. This
does not include the header parts in the response.
Advanced Configuration 99
Robot Exclusion
Sometimes you might see a log file entry like this:
204.123.9.20 - - [11/Nov/1998:15:24:25 -0600] "GET /robots.txt HTTP/1.0"
404 336
A robot is a program that traverses Web sites on the Internet and indexes
the keywords it finds into it’s own database. Robots are typically used to
update page indexes of search engines. The machine in the example
above is called scooter and it is one of Altavista’s Web robots.
robots.txt is a file that you can add to your document root and define some
restrictions for robots; what they should not index and what they should.
You might want to deny robots of indexing some frequently updating pages.
The file robots.txt must be located directly under document root and the
access rights should permit anyone to read it.
The structure of error log entries differs from CLF (Common Log Format).
Below is an excerpt from an example error log:
[Fri Nov 6 14:24:48 1998] [error] [client 1.2.3.4] File does not exist:
/usr/docs/status
In each error log entry, the date and time are enclosed into brackets first, and
the following field describes the severity of the event. If a client was involved
in the event, its IP address (or hostname) is contained in the next field. The
explanation part begins with the name of the module where the event
occurred. If the event occurred in the server core, the field begins with “httpd”
or just the message text.
ErrorLog messages can also be directed to the syslog daemon that handles
all the system error massages on AIX. This opens up a possibility to combine
the IBM HTTP Server with some system management software like Tivoli TME.
By default, the directive uses syslog facility local7, but the facility can be
overridden by adding a colon and the facility name to the directive. For more
information about the syslog daemon, see the online manual page for syslogd or
the AIX manuals. The syslog logging can be turned on by defining the directive
ErrorLog as shown below:
ErrorLog syslog or
ErrorLog syslog:local7
Error log files can also be rotated using the rotatelogs command. The use of
rotatelogs is described later in 5.7.4, “Rotating the Server Logs” on page 103.
While the TransferLog directive accepts only a file name for logging
transactions, the CustomLog directive also accepts formatting information or
a nickname for predefined formats as shown below:
TransferLog /var/log/http/access_log
CustomLog /var/log/http/custom_log "%h %l %u %t \"%r\" %s %b"
We have defined a couple of log formats that are used in the CustomLog
directives. We have replaced the TransferLog directive with the CustomLog
directive that was shown earlier. We made a special log file for logging all the
404 (Not Found) errors. The log format can be defined so that it will produce
an event only for requests that end up with a certain status value. This
condition is defined in the 404_requests LogFormat by setting the status code
just after the percentage character in the %{Referer}i argument.
The example below is a clip of the 404_log file defined in the previous
example. That reveals that on the HTML page new_prod.html is a broken link.
The second row is probably the same client who has now tried to reload the
missing document, since it has not the referrer field.
[14/Dec/1998:16:50:39 -0600] "http://www.CompanyA.com/new_prod.html" "GET
/Docs/Serv/hot/33823_rel_notes.html" HTTP/1.0"
[14/Dec/1998:16:50:42 -0600] "-" "GET /Docs/Serv/hot/33823_rel_notes.html"
HTTP/1.0"
The last log definition example shown above logs all the events that have not
ended successfully. This is done by excluding the successful status codes in
%h argument. The exclusion is done by adding a exclamation mark (!) before
the status code or the list of status codes. The status codes can be added as
a condition statement into any % argument in the syntax. Here are the
arguments that can be used in log format definitions:
Table 10. Custom Log Format Arguments
Argument Definition
f Requested filename
h The IP address of the client (or hostname, if DNS lookups are used)
p The port number that was used to make a connection to the server
r The request
{TIMEFMT}t Time in customized time format. Time format must follow the rules
that are specified for strftime system command.
u Remote user name that the client has given the server
There is a number of free log analyzing tools available on the Internet that
you might find helpful, for example http-analyze (http://freshmeat.net).
Because the IBM HTTP Server provides the possibility to redirect the logging to
a UNIX pipe file, it is possible to use an external application to do the log file
change. The rotatelogs application that ships with the IBM HTTP Server can be
found in the directory /usr/lpp/HTTPServer/sbin. It takes two arguments; a log
file name and a timer value that counts the time in seconds after the log file
switch should be done. The rotatelogs creates the log file and adds a system
time when the server is started as an extension to the log file name. It creates
The timer value represents seconds, so the 86400 value represent 24 hours.
In case you need to change the server logs manually, for instance, because
you are running out of free disk space in a file system, you can do this by
copying the log file to another location (or to tape archive) and then emptying
the file without recreating it, as shown below:
# cp /usr/lpp/HTTPServer/var/log/access_log /archive/access_log.121298
# cat /dev/null > /usr/lpp/HTTPServer/var/log/access_log
Note: Although this procedure frees the allocated space on disk for the
particular log file, the ls command may still report the file to be large. To avoid
such confusion, this procedure should not normally be used in daily
operations.
5.8 Auditing
Logs are quite useless, unless they are looked after every now and then or
saved for future examination, in case it becomes necessary. The Web server
logs reveal, for instance, disfunctionalities in Web sites, such as broken links
and certificate expiries. Logs can be used to work around the problem with
the lack of control in Web server authentication. In Web authentication there
is usually no possibility to protect the system from brute force password
cracking. Since the HTTP communication is stateless, it is difficult to detect
that two failed authentication failures have anything in common with each
other.
Let’s say that you tolerate only two failed access attempts per day and you
have collected all the 401 events to a log file. The log file could look
something like this:
[14/Dec/1998:08:47:16 -0600] 1.2.3.6 -
[14/Dec/1998:08:47:17 -0600] 1.2.3.6 goofy
[14/Dec/1998:09:01:13 -0600] 123.3.3.12 -
[14/Dec/1998:09:01:19 -0600] 123.3.3.12 dale
[14/Dec/1998:09:30:03 -0600] 1.2.3.6 -
[14/Dec/1998:09:44:16 -0600] 1.2.3.6 -
[14/Dec/1998:11:02:32 -0600] 1.2.3.6 goofy
[14/Dec/1998:13:40:22 -0600] 127.0.0.1 -
[14/Dec/1998:13:47:16 -0600] 127.0.0.1 -
[14/Dec/1998:13:47:24 -0600] 127.0.0.1 goody
[14/Dec/1998:15:47:44 -0600] 1.2.3.6 -
[14/Dec/1998:15:47:49 -0600] 1.2.3.6 ihsadm
[14/Dec/1998:17:34:44 -0600] 1.2.3.6 -
[14/Dec/1998:22:34:52 -0600] 209.3.244.2 -
[14/Dec/1998:22:34:55 -0600] 209.3.244.2 ihsadm
[14/Dec/1998:22:34:59 -0600] 209.3.244.2 ihsadm
[14/Dec/1998:22:35:07 -0600] 209.3.244.2 ihsadm
[14/Dec/1998:22:35:10 -0600] 209.3.244.2 ihsadm
[14/Dec/1998:22:35:14 -0600] 209.3.244.2 ihsadm
From this kind of log, you could easily notice that someone has tried to guess
the password of ihsadm. A script could detect that the failed authentication
count is exceeded for this user.
Note: The author(s) of mod_speling actually pronounced the need for such a
module right by misspelling its name; the module is correctly spelled
mod_speling, not mod_spelling.
For example, assume there is a file support.html. All the following requests
would cause the server to correctly access and return that file:
• suport.html (one character is missing)
• suppoprt.html (one character is added)
• suppotr.html (order of two adjacent characters are changed)
• sypport.html (one character is wrong)
• SupporT.HtmL (wrong case)
The main drawback of this URLs fixing feature is its impact on the server’s
performance because the file system needs to be scanned on each request.
If two or more similar files are found, the server returns a list of these files by
asking the user for a selection. That can expose unwanted file names and
can be treated as security flaw.
The URL correction feature can be enabled using the directive CheckSpelling
in the server configuration file. The required module (mod_speling) is enabled
by default with a corresponding LoadModule and AddModule directive.
The proxy function is defined in the HTTP protocol specification. A proxy Web
server acts as server and client at the same time. It receives proxy requests
from a client (browser) and forwards these requests to a destination server as
if it was a client itself. The answer received is then returned back to the client.
Since all information passes through the proxy server, it can do logging and
caching.
The required mod_proxy is not included with the IBM HTTP Server (see 3.1,
“Product Contents” on page 33). Please read Chapter 8, “Building HTTP
Server Modules” on page 177 to learn how such modules can be included.
After you have compiled mod_proxy as a DSO library module, copy it to the
/usr/lpp/HTTPServer/libexec directory and add the following directives to
corresponding parts of the configuration file:
LoadModule proxy_module libexec/mod_proxy.so
...
AddModule mod_proxy.c
...
ProxyRequests On
The proxy server access can be protected by the same configuration as any
other Web server resource (see 6.2, “Basic Authentication” on page 118). In
that case, users must provide authentication information (usually name and
password) to get access to the proxy function.
If a browser is set up this way (see Figure 18), it sends its requests to the
proxy server at the specified address and port, rather than to the Web server
that might be running on the same machine and behind the same port. The
proxy server will then forward that request to other Web servers, including the
one that might be running on the same machine and port. Note that, as
shown in Figure 18, browsers also allow you to exclude certain domains from
using a proxy.
Because of the rapid growth of the Internet, security has become one of the
most essential issues in network communication. Commercial transactions
have taken place on the Internet and organizations’ internal intranets.
Electronic commerce evolves so rapidly that it is easy to give reasons for
paying attention to security; lapses in security have a price-tag.
The common believe of the Internet as a paradise for scoundrels and villains
who try to swindle your money might be a little overstated, but a network has
its pitfalls. When the Internet started to expand, security was not one of the
main concerns. Everyone was so enthusiastic to invent new ways to utilize
this exciting new, nonprofit network, that the concern about security was left
aside. Quite often, only simple ways of hiding information from unauthorized
users were chosen.
The Internet protocol is founded on trust that every party obeys some
common rules. It is easy for a hacker, not behaving according to these rules,
to use that trust against the principles of the Internet itself. Today the Internet
and the many intranets cover all the aspects of community, from personal
communication to government processes, and from research to business. It is
unfortunate that this kind of opportunities also attracts a few individuals with
something other than honesty in mind.
After introducing some basic elements of security, this chapter will introduce
the important concepts of the security-related features and the configuration
of the security functions of the IBM HTTP Server.
As far as Web servers are concerned, all of these security areas may be
involved.
The simplest (and most common) way to authenticate a user is to ask for a
user ID and a password. This method is good as long as the passwords are
not too easy and are either transferred over the network securely or not at all.
The simple passwords are easily broken by using brute-force method . In most
operating systems it is possible to define some rules for passwords. If such
rules for password are too strict, users may have problems remembering
them, creating another exposure since users have to write them down.
Except when SSL connections are being used (see 6.4, “Secure Sockets
Layer, SSL” on page 129), the HTTP authentication passwords are
transferred over the network without decent protection. The Base64-encoded
user ID and password can be decoded in no time.
Today there are also more sophisticated methods to authenticate a user. One
of them is the unforgeable SSL client certificates. Client certificates are
described later in this chapter in 6.5, “SSL Client Authentication” on page
146.
Digital certificates are a good way to prove the identity of a person or another
object. Certificates are digital documents that contain the information about
the person or object that owns the certificate. A certificate also contains a
public key that can be used for encrypting or decrypting messages. The basic
idea behind a certificate is that a trusted authority, called a Certificate
Authority (CA), has once identified you and proves that you are who you
claim to be. The public key contained in such a certificate can be used to
encrypt information that is only intended for the certificate owner because the
certificate owner is in possession of the secret private-key that is required to
decrypt the messages encrypted with his public-key. The certificate and the
public key are freely-distributed information. The certificate also contains the
digital signature of the signer Certificate Authority, which protects the
certificate from tampering.
6.1.2.2 Authorization
Access control takes place after authentication. Authorization grants or
denies the user accesses or rights to perform some operation based on the
user’s identity. In UNIX systems, for example, the file access control is based
on the user ID and access control attributes attached to each directory and
file.
Marketing Group
Document Root
You should be careful with symbolic links, too, because those can
unintentionally enlarge the Web server’s scope. In the server configuration
file you can define whether the Web server should follow symbolic links or
not. It is a good idea to not allow this. The Directory options to look for are
FollowSymLinks and SymLinksIfOwnerMatch.
The use of the UserDir directive can also be a security flaw. If users are
allowed to publish their own Web pages off their home directories by setting
the UserDir to something like “./”, accessing http://www.CompanyA.com/~root/
could expose unwanted files to the users.
Alice would like to send confidential e-mail to her colleague Bob. If they used
symmetric cryptography, they would have to agree on a common key they
would use for encryption and decryption. A problem that arises is that the key
cannot be simply exchanged in e-mail messages because of the lack of
security in e-mail. Because of this, Alice and Bob would have to find a way to
securely exchange their key, which might not be easy. Encryption and
decryption would then be done with that single key used on both ends of the
conversation. If either Alice or Bob wanted to securely communicate with
other parties as well, a separate secret key would have to be maintained and
exchanged for each communication partner.
If Alice and Bob used public-key cryptography instead, Alice could have
asked Bob to send his public key to her first. In fact, because it is not a secret
key, Bob could also furnish his public key by any other means, such as
through his Web site. Alice could then encrypt her message using Bob’s
public key and send the encrypted message to him. Only Bob would be able
to decrypt the message because only he is in possession of the private key
that must be used to decrypt that message. Not even Alice would be able to
decrypt the message that she encrypted.
Now a new problem arises: How does Alice know for sure that the public key
she is using is Bob’s public key and she is indeed talking to Bob? One way to
verify the integrity of data in such occasions would be that Bob runs the key
through some hash function, and then includes the resulting hash (also
known as fingerprint, see below) with the message. Alice can then run the
same hash function and compare the fingerprints. If the fingerprints match,
the key is not corrupt. But this still does not guarantee that the sender was
Bob; it could have been anybody pretending to be Bob, using a fake
private-public key pair. The way to solve this is by involving a trusted third
The above was a very brief and simplified description of how public-key
cryptography works and what certificates are. There will be more details in
the sections that follow.
The IBM HTTP Server implements these methods in the SSL module and in the
authentication methods, as explained later in this chapter.
The IBM HTTP Server also supports the Secure Sockets Layer (SSL) protocol
used for secure connections between browsers and servers. This should not be
confused with authentication; SSL only assures a secure connection that
prevents any attacker from spoofing the network. The SSL module and the SSL
client certificates will be discussed later in 6.4, “Secure Sockets Layer, SSL” on
page 129.
In the configuration file clip above, the location /Docs/Mktg is protected using
basic authentication such that it can be accessed only by people who are
members of the mktg_grp or rd_group groups. The members of these groups
The Require directive defines which users and groups are to be granted
access the area. The Require directive also accepts the value “valid-user”
that tells the server to grant access to anyone who is listed in the defined
password file.
You may have noticed that the realm name is defined to be the same as in the
previous example (“Protected Material”). This is done in this example
because if the realm names were different and a user from the rd_group
wanted to move from the /Docs/Mktg location to /Docs/RD, he/she would
have to re-authenticate to the server. As long as a user accesses resources
that have the same realm name, he or she is not required to re-enter the user
ID and password.
The /Dev directory is protected such that only the users in the rd_group group
and user ihsadmin are granted access. The realm name is now different
(“Development Site”), so users cannot move from the protected directories
underneath /Docs to /Dev without logging in again.
In the examples above, absolute path names were used for the AuthUserFile
and AuthGroupFile files because using absolute path names lessens the
chance of logical configuration errors. If relative path names were used in the
Directory containers, they would have been relative to the directory specified
by the ServerRoot directive, which represents a security exposure. The path
names of security files like in the AuthGroupFile and AuthUserFile directives
should therefore be absolute.
It is possible to also protect individual files with the <Files> container, but you
should keep in mind to not store any confidential configuration files in the
scope of Web server, even though it can be separately protected.
The use of .htaccess file is controlled by the AllowOverride directive (see also
4.7.2, “Restricting the Directives within .htaccess Files” on page 60). If the
AllowOverride directive has not been defined within any part of Web server’s
scope, it defaults to AllowOverride All, which is not advisable. In the
The .htaccess file is likely to contain sensitive information like user IDs and
server names that are allowed to connect to the Web server. By definition, the
.htaccess files are located in the Web server’s scope, and may therefore be
accessible from browsers. To protect all .htaccess files in the system, add the
following <Files> container to the httpd.conf file at the top level.
<Files ~ "\.htaccess$">
order deny,allow
deny from all
</Files>
The use of .htaccess files allows you to define multiple things dynamically;
almost any file and directory related directives (except the Directory container
itself) can be defined in an .htaccess file. Before using the .htaccess files, be
sure that you understand the AllowOverride directive (see also 4.7.2,
“Restricting the Directives within .htaccess Files” on page 60).
It is also possible to change the name of the .htaccess file(s) by means of the
AccessFileName directive in the httpd.conf file.
You can create the AuthUserFile authentication password file while adding
the first user to it as shown below.
# htpasswd -c /usr/lpp/HTTPServer/security/users ihsadmin
Adding password for ihsadmin.
New password:
Now you have to define a password for user ihsadmin, which you are about to
create into the /usr/lpp/HTTPServer/security/users file. After that, the
program will ask you to retype the password for verification.
After adding the first user, the contents of the password file will look
something like:
ihsadmin:6uRBipvs0Jc22
The password file is actually identical to a UNIX password file (and uses the
same password encryption algorithm), except that only the first two fields
(User ID, encrypted password) are used. Further fields as found in a UNIX
password file are ignored. There is a temptation for administrators to simply
use the UNIX password file rather than creating and maintaining a separate
file for Web user authentication. This should never be done because the
UNIX password file contains more information and also sensitive users, such
as root.
The group file is a plain text file that can be crated and edited with a text
editor. Each line in the group file consists of a group name followed by a
colon and a space, and then the user IDs of the group members in a space-
separated list. Here is an example that defines two groups:
mktg_grp: huey dewey louie
rd_group: dale chip
The textual password file is good for occasional use or if there is a small
number of users. If the use is frequent or if it contains a large number of
users, it might slow down the server. For large systems that use basic
authentication extensively, the use of indexed database files as supported by
the mod_auth_dbm module, is favorable.
The first argument is the name of the DBM database file. If the DBM file does
not exist, it will be created. The second argument is the operation that is to be
performed and the third argument, in this case, is the user ID that is to be
added. The command above inserts the user ID ihsadmin into DBM file
/usr/lpp/HTTPServer/security/dbmuserfile.pag. The command will
subsequently ask for the user’s password. The operations that dbmmanage
accepts are listed in Table 11.
Table 11. Operation Arguments for dbmmanage
Operation Role
add Adds a user to the DBM file. Requires a fourth argument that is the
encrypted password for the user.
check Asks for a password for the user and compares it to the password that is
stored in the DBM file.
import Imports user ID and password pairs from a textual flat file into a DBM file.
The user IDs and the encrypted passwords must be separated with a
colon (:), as found in UNIX password files.
update Updates the password of the user and checks that the user exists in the
DBM file.
view Shows the selected user ID and encrypted password that are defined in
the DBM file. If no user ID argument is given, it lists all the user IDs in the
database. Can be used to import the user IDs and passwords into a flat
file.
More information about dbmmanage can be found in the man page that ships
with the IBM HTTP Server (3.4, “Default File and Directory Structure” on page
37 for an explanation on how to use the man pages).
User authentication in the IBM HTTP Server can easily be combined with other
authentication methods, such as Kerberos or the Distributed Computing
Environment (DCE). There are many authentication modules available for the
Apache Web server that can be used with the IBM HTTP Server as well.
When the browser requests a protected resource, the Web server returns a
401 error code (Unauthorized) and includes the WWW-Authenticate header.
The WWW-Authenticate header tells the browser which authentication
schemes the Web server supports and the name of the realm that contains
the protected resource. If the user has not already entered a valid user ID and
password for this Web server and realm, the password dialog box is
presented (see Figure 20 on page 120). In subsequent requests, the browser
can find the information from its internal cache or password file. The browser
then requests the same document, but this time with an Authorization header
that includes the user’s ID and password. Thus, each single request actually
creates to requests being sent to the Web server if basic authentication is
involved, which also includes any images that might be included in a page.
The basic idea of digest authentication is that the Web server does not store
the user’s password encrypted in its authentication files, but stores MD5
hashes of strings that contain user ID, password, and the authentication
realm name.
When the user requests a protected resource from the Web server, the Web
server returns the 401 (Unauthorized) rejection message, which includes the
Setting up digest authentication is fairly easy. You should make sure that the
mod_digest module is configured in the server configuration file(s) with a
LoadModule and an AddModule directive. Follow the instructions for basic
authentication, but specify “Digest” as the authentication method as in the
following example:
<Directory /usr/lpp/HTTPServer/share/htdocs/Confidential>
AuthType Digest
AuthName "Protected Material"
AuthDigestFile /usr/lpp/HTTPServer/security/digestusers
...
</Directory>
The authentication file (digestusers in the example above) can be created and
maintained using the htdigest utility that is shipped with the IBM HTTP Server
You can find more information about the IETF by visiting their Web site at
http://www.ietf.org.
This book only covers the SSL protocol and the key management issues that
are related to it because SSL has become a de facto standard in today’s
Internet and intranet applications.
SSL adds an additional layer between network protocols and the protocols
that are used on the application level. It encapsulates TCP/IP socket so that,
in theory, every application using TCP/IP could use SSL to secure the
connections (Figure 21).
Application(s)
(WWW, POP, SMTP, E-Mail)
TCP/IP Layer
SSL is usually used for privacy (data encryption) and server authentication.
SSL can optionally be used to authenticate a client by using client
certificates. The client certificates are discussed later in this chapter in 6.5,
“SSL Client Authentication” on page 146.
2 IBM
C
IBM
3
IBM
C Pre-Master Secret
4 Generation of
Master Secret
(Session Key)
5
CipherSpec
6
Handshake Finished
1. The client (in this case actually the Web browser running on the machine
to the left in Figure 22) makes a request to get an SSL connection from the
server www.CompanyA.com. The client includes in its request, among
others, a session identifier, list of compression methods and encryption
options that the client supports, as well as a random number that will be
used later. Notice that the URL in the client’s request starts with https
rather than http, which is a request for a secure connection.
2. The server includes in its response the encryption options it supports and
its random number. In addition, the server delivers its X.509 certificate to
the client. The X.509 certificate includes the public key of the server. In
this step, the server can optionally request the client to provide its client
certificate for client authentication (see 6.5, “SSL Client Authentication” on
page 146).
3. The client encrypts the random number it has sent in step 1 and the
random number it has received from the server with the server’s public
key and sends this message (also called pre-master secret) to the server.
The server decrypts it with its private key. If the decrypted numbers match
the originals, it proves that the client must have received the server’s
X.509 certificate (with its public key) correctly.
These steps result in a secured, encrypted connection between the client and
server. Server authentication on the Web is done by the Web browser (client);
if the server’s certificate was not signed by a well-known Certificate Authority,
the browser will alert and ask the user whether or not this server should be
trusted (subject to the browser’s individual configuration). Data integrity is
guaranteed by using the keyed message authentication codes, or MACs. The
keyed message authentication codes are hashes of messages that are
calculated and included with every message during the SSL handshake
process. MAC’s are created with a secure hash functions, like MD5 or SHA-1.
32, SSL_RSA_WITH_NULL_SHA - x x x
31, SSL_RSA_WITH_NULL_MD5 - x x x
30, SSL_NULL_WITH_NULL_NULL - x x x
There is no difference between defining the used cipher specification with the
complete name as shown in Table 12 above, or with the corresponding
number. When cipher specifications need to be defined in the server
configuration file(s) either their specification numbers or their names may be
used (see Table 12). If problems occur, you might want to change the
LogLevel parameter to debug and search log files for messages about cipher
loading failures. Cipher load may fail, for example, when the SSLCipherSpec
name is misspelled or is not valid in a particular export version.
The log level can be changed with the LogLevel directive and the valid
values for it are debug, info, notice, warn, error, crit, alert and emerg.
IBM provides an application with a graphical user interface for managing the
SSL key database(s). A key database is a file which includes root certificates
of well-known Certificate Authorities and also the SSL-keys that are issued to
the system. The key database is protected with a password that is the key to
manage all the key information in the key database. IKEYMAN is a Java
application that comes with the IBM HTTP Server for handling the key
management procedures. Although it is not necessary to run the IKEYMAN
utility as root user, it is preferable because the managed keyfiles are owned by
the user who starts the IKEYMAN application. Since IKEYMAN is a graphical
application, it must be run on a graphical display.
Before you can start IKEYMAN, you have to define the environment variable
JAVA_HOME
# export JAVA_HOME=/usr/jdk_base
2. In the upcoming New dialog, select the CMS key database file from the
list of key database types and type in the name and location for the file
and click OK.
The file extension of the key database file should be .kdb.
The location for the kdb file can be freely chosen. In the examples shown
here, the key database file CompanyA.kdb is located in the directory
/usr/lpp/HTTPServer/keys. If the server will be started as root user, you
might want to cut down the access permissions to that directory with the
commands:
# chown root.system /usr/lpp/HTTPServer/keys
# chmod 0700 /usr/lpp/HTTPServer/keys
3. The password dialog box opens, and you are asked for a password for this
database. You might want to define and record the expiration time for the
key database. Click the “Stash the password to a file?” checkbox. If the
password is not stashed into a file, the Web server will not be able to start
automatically. If you do not stash the password to file, you are asked for it
Note
At this point, you have to remember that even if you are creating this
key database for testing purposes only, you might someday want to add
a production key into it. To ensure adequate security, a good password
should be used. Also the password expiration should be defined.
SSL will not work anymore after either the certificate itself expires or the
key database password expires. In the latter case, the server’s error log
contains an error message like “GSK could not initialize, Unrecognized
error code returned from GSK.”
4. Select Create -> New Self-Signed Certificate... as shown in Figure 24.
The following section describes the procedure to request certificate from the
Certificate Authorities whose root certificates are predefined in the key
database.
Here are the directions of how to request a certificate from trusted Certificate
Authority:
1. Launch the IKEYMAN application and open the key database you have
created.
2. Choose Personal Certificate Requests from the pull down list in the
middle of the application window as shown in Figure 26. Click on the
New... button in the button list that appears on the right.
3. On the dialog that appears, enter the key label for the certificate and fill in
the other information about your certificate. You can choose from two key
lengths: The 512 bit key size is sufficient for most applications unless
maximum security is required. A key of 1024 bit size, on the other hand,
requires more processing power, which might be a performance factor.
4. Specify the directory and file name for the certificate request. The file type
with extension .arm is PKCS 10 type file in armored 64 bit format. The
private key is stored to the directory that contains your key database files.
The file name extension of the stored private key database is .rdb. The file
locations or file names of these files should not be changed.
5. Follow the Certificate Authority’s instructions of how to submit the
certificate request to the certifier.
6. When you receive the certificate from the CA, the key has to be imported
into your key database. To do so, launch the IKEYMAN application and
open the key database.
7. Select Personal Certificates from the pull-down menu and click the
Receive... button on the right. The Receive Certificate from a File dialog
appears (see Figure 27 on page 142).
8. Enter the certificates file name and location and click OK.
9. Highlight the new certificate on the list of Personal Certificates and click
View/Edit... The Key information dialog appears.
10.If not selected, select Set the certificate as the default checkbox. If you
have some other software that uses the same key database, this change
may affect them.
The certificate is now ready for use. Do not forget to add a reminder to your
calendar when the key database password or certificate is going to expire.
Some of the CAs send e-mail about a month before the certificate expires.
2. Click on the Add... button to add CA’s root certificate from a file.
3. On the pop-up dialog, verify that the file type is Base64-encoded ASCII
data. Fill in the location and file name of the certificate file.
4. Click OK to mark the certificate trusted and to store it.
After completion, the new root certificate will show up in the Signer
Certificates list. The root certificate is now available to every certificate you
intend to include in this key database.
The default port number for SSL is 443. In order to achieve this, defining a
virtual host comes in handy. When editing the httpd.conf file, keep in mind
that comments within the configuration sections are not allowed.The following
actions guide you through these steps:
1. First add the following row into the httpd.conf file as the first item of the
LoadModule list:
LoadModule ibm_ssl_module libexec/mod_ibm_ssl.so
2. Add the following row as the first line to the AddModule list:
AddModule mod_ibm_ssl.c
3. Add the port number for the virtual server just below the “Listen 80”
statement. The default port number for SSL is 443.
Listen 443
4. Check that you have defined the ServerName directive:
ServerName www.CompanyA.com
Add following text-block to the end of the httpd.conf:
<VirtualHost :443>
SSLEnable
SSLClientAuth none
DocumentRoot /www/html/CompanyA
ErrorLog /www/logs/CompanyA/error_log
TransferLog /www/logs/CompanyA/access_log
</VirtualHost>
SSLDisable
Keyfile /usr/lpp/HTTPServer/keys/CompanyA.kdb
SSLCacheEnable
SSLCachePortFilename /usr/lpp/HTTPServer/tmp/siddfile
The SSL timeout parameters are related to caching of the SSL session IDs.
SSL session IDs should be cached in order to reduce the expense of
repeating SSL handshaking. The IBM HTTP Server uses an internal daemon
process sidd to cache the SSL session IDs to a file that is accessible by the
HTTP server processes. Make sure that the file and the directory, defined in the
directive SSLCachePortFilename, is writable by the user the server processes
ran under.
The following example defines two Web sites in the same httpd.conf file. This
requires four VirtualHost containers to be defined in order to have it working:
<VirtualHost 1.2.3.4>
ServerName www.CompanyA.com
ServerAdmin [email protected]
DocumentRoot /www/html/CompanyA
ErrorLog /www/logs/CompanyA/error_log
TransferLog /www/logs/CompanyA/access_log
</VirtualHost>
<VirtualHost 1.2.3.5>
ServerName www.CompanyB.com
ServerAdmin [email protected]
DocumentRoot /www/html/CompanyB
ErrorLog /www/logs/CompanyB/error_log
TransferLog /www/logs/CompanyB/access_log
</VirtualHost>
<VirtualHost 1.2.3.4:443>
SSLEnable
SSLClientAuth none
<VirtualHost 1.2.3.5:443>
SSLEnable
SSLClientAuth none
SSLServerCert Company B
ServerName www.CompanyB.com
ServerAdmin [email protected]
DocumentRoot /www/html/CompanyB
ErrorLog /www/logs/CompanyB/error_log
TransferLog /www/logs/CompanyB/access_log
</VirtualHost>
SSLDisable
Keyfile /usr/lpp/HTTPServer/keys/Keyfile.kdb
SSLV2Timeout 100
SSLV3Timeout 1000
Note
The file httpd.conf.sample.ssl that ships with the SSL module of the IBM
HTTP Server contains a wealth of information in the the form of comments
that further explain the setup of SSL, including client authentication
(subject to the next section). This file is located by default in
The need for client authentication and the level of identification depends
greatly on the needs of Web site owners. For example, the owner of a Web
site that needs to be very certain about the identity of the individuals who
have access to their Web page might choose to run their own CA software
The IBM HTTP Server supports client certificates issued by any CA software that
is capable of issuing X.509 certificates. If you are going to use certificate
revocation lists (CRL), the IBM Vault Registry is suitable to provide that
function. The CRL is a database of certificates that are revoked before their
expiration date for any reason. You can find more information about the IBM
Vault Registry at http://www.software.ibm.com/commerce/registry.
If you choose required, only users with valid certificates that are signed by a
trusted CA are granted access. The optional value causes the server to ask
for the client certificate, but it is not necessarily required. This option is often
used to allow more specific authentication for certain administrative users.
There are two types of access control that can be used in conjunction with
SSL client authentication. The first, fake basic authentication, uses the client
certificate’s distinguished name as the user for normal basic authentication.
The fake basic authentication functionally is deprecated and should not be
used if possible as it does not provide effective authentication. The directive
to specify the fake basic authentication is SSLFakeBasicAuth. The better
alternative to client authentication is to use the SSLClientAuthRequire
directive.
There are some security considerations in the httpd.conf file. As has been
mentioned earlier in 3.6, “Initial Setup” on page 42, the server should not be
run under the user account nobody because there are other processes using
the same account. The best approach is to define dedicated user and group
for running the server processes.
As discussed before, you should beware of symbolic links that can easily and
inadvertently enlarge the scope of the server. The Directory options to look
for are FollowSymLinks and SymLinksIfOwnerMatch.
Server side includes (SSI, see also 10.4, “Server-Side Includes” on page
219) provides the possibility to execute some local programs via HTML
pages. This can be a good feature for some occasions, but you might want to
deny the execution by defining:
Options IncludesNOEXEC
The use of the .htaccess files can also be controlled. One way to deny all the
.htaccess overrides, includes and accesses is to create a directory container
for the root directory of the server:
<Directory />
AllowOverride None
Options None
Allow from all
</Directory>
You might also consider denying the access to the root directory by changing
the “Allow from all” above to:
Order deny,allow
Deny from all
That kind of access control denies access to all locations except to the ones
that are specifically allowed in other directory or location containers. The
sequence to parse the containers is Directory, Files and Location. Beware of
As discussed in 5.3, “User Directories” on page 85, the UserDir directive can
also cause some security exposures if it applies to the root user. This can be
avoided by defining:
UserDir disabled root
Clients
Web Server
Operating System
Memory
Disk Storage CPU
Network Interface
Network
A typical Web server usually only requires a system with a single processor
and a limited amount of memory and hard disk storage space. However,
companies may not purchase a dedicated Web server machine for providing
electronic information to their clients, unless they are in the ISP area of
business, or have a large number of clients. Thus, in small environments, the
Web server may have to share CPU, memory, disk, and network resources
with other applications and the operating system running on the same
machine.
7.1.2.1 CPU
The CPU is mostly used for client request processing and in the rare instance
when the IBM HTTP Server parent process spawns new child processes to
handle a new request. Client-request processing can be as easy as simply
delivering a static HTML page, or it can involve a considerable amount of
application code. Depending on the amount of extra processing necessary for
client requests, the CPU can be a limiting factor, although in many cases the
CPU is not the dominating bottleneck of a Web server machine.
Spawning new child processes can slow down the operation of a Web server
if it happens too often. The IBM HTTP Server keeps a certain number of httpd
processes running that can process client requests in parallel (see also 3.7,
“Server Process Structure” on page 43). The minimum and maximum number
of these httpd processes, along with other, related numbers, can be
configured (see 7.1.3.1, “Process Handling” on page 158). If the configuration
is balanced, there should not be many httpd processes spawned over time in
normal operation and, therefore, the Web server performance is not
considerably affected by spawning new processes. Process spawning may
become more of an issue when CGI programs and application code is
involved. Please read 10.2.3, “CGI Performance Considerations” on page 217
for more information.
For machines that are required to handle heavy Web serving functions, as
well as other applications, database or programs that are CPU-intensive, a
multiprocessor system may be considered. Programs can then run
concurrently in these machines and not be held back by Web processes. The
IBM HTTP Server will use all CPUs in a multiprocessor system. On the other
hand, webmasters are reminded that multiple processors do not always
increase the performance tremendously because other common resources,
such as memory, disk storage, and network connections are still shared by all
processors and programs. Please refer to 7.3, “Scalability for the IBM HTTP
Server” on page 171 for further discussions.
7.1.2.2 Memory
Together with the raw performance of the CPU, the amount of available
physical memory greatly affects Web server performance. There should be
enough physical memory available that the system does not need to start
paging (see the next section). The amount of memory that is necessary to
fulfill this requirement greatly depends on the following factors:
• Requirements of any other application running at the same time
Note: The amount of additional memory required by DSO modules is not only
dependent of the number and size of those DSO modules, but also on the
number of httpd server processes that are running at the same time since
each httpd server process may instantiate the DSO modules.
However, should paging occur, the paging space(s) on the disk(s) should be
placed so that the least impact on performance results. The three most
important rules for paging space allocation are:
• Have a paging space on every disk in the system (provided they have
about the same average access time).
• Only use one paging space per disk. Multiple paging spaces per disk
reduce paging performance.
• Check the characteristics of each disk and define the paging space at the
best-performing place (usually center or edge, depending on disk model).
Please read the General Guidelines at the end of this section for more
discussion on this topic.
Please read the General Guidelines at the end of this section for more
discussions.
Please read the General Guidelines at the end of this section for more
information.
General Guidelines
The three directives MaxSpareServers, MinSpareServers and StartServers
are closely related because they control the number of httpd processes
running on the Web server. Basically, adjustments should only be applied
to very busy sites and considerations should be made in these three areas
regarding the operating system, the number of preloaded modules and the
machine load. If the machine load is high, increase the MinSpareServers
and StartServers directive values, but generally the values should not be
set too high for all these directives. A good idea is to set the
MinSpareServers and MaxSpareServers directives to similar values, or
even the same value. A value for the MaxSpareServers directive close to
that of the MaxClients (mentioned above) results in an optimized response
time since it minimizes the chance that a new process needs to be
spawned prior to client request processing.
KeepAliveTimeout — Sets the amount of time the IBM HTTP Server holds
the connection for a subsequent request before closing it on
acknowledgment. Webmasters should consider the trade-off between
network bandwidth and server resources when changing this value (the
default is 15 seconds). However, it is not advisable to raise this value to more
than 60 seconds. The timeout value after the request has been received is
governed by the Timeout directive mentioned below. A large value is not
advisable because it blocks the system resources if no requests are
submitted.
Timeout — Sets the amount of time the IBM HTTP Server waits for these
three events:
• Time taken to receive a GET request
• Time taken between receipt of TCP packets on a POST or PUT request
• Time taken between acknowledgments on transmissions of TCP packets
in responses
Note: The directives explained in this section are not included in the default
configuration file since they are normally not used, because they can (or even
should) be specified on the operating system level, if required. These
directives should only be used when the values need to be set lower that
what the operating system permits.
RLimitMEM — Sets the soft and hard limits for the memory usage (in bytes)
per httpd child process. The webmaster should consider this value when
deciding on the corresponding value in the MaxClients directive in 7.1.3.1,
“Process Handling” on page 158.
RLimitNPROC — Sets the soft and hard limits for the number of processes
per user. For the case of CGI processes running under the Web server’s UID
(which is the normal case), the limitation set with this directive restricts the
number of processes the server itself can create by forking. Thus, it might
limit the server’s ability to create new httpd processes.
These results are not cached within the server, which causes the checking
procedure to occur for each client request. The same applies for the use of
SymLinksIfOwnerMatch, which is used for security purposes. To optimize the
use of the Options directive with these two parameters, here is what the
webmaster can do:
DocumentRoot /www/htdocs
<Directory />
Options FollowSymLinks
</Directory>
<Directory /www/htdocs>
Options -FollowSymLinks +SymLinksIfOwnerMatch
</Directory>
Here again, if performance is the key issue for the Web server, webmasters
can just specify AllowOverride None for all the directories.
When it comes to performance, using a wildcard for the directive would cause
the server to depend on content negotiation (introduced in 2.1, “Features of
the Apache Server” on page 11) to find out what the client’s browser is
It is, however, recommended to specify a specific list so that the server does
not spend time and resources trying to get the best match. The following is an
example of such an explicit list:
DirectoryIndex index.cgi index.pl index.shtml index.html
The following shows some useful utilities on AIX and a simple example of
usage. Most of these utilities have other command-line options that let you
vmstat
This very basic, yet powerful utility, displays some statistics about system
resource usage, such as CPU utilization (percentage of user, system, idle,
and waiting times), memory usage, paging activities (page lists, pageins,
pageouts), and number of runnable and blocked processes. The statistics
can be shown once (since system start), or periodically. The following shows
an example where four records are to be displayed with a five second
interval:
# vmstat 5 4
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 18631 131 0 0 0 0 0 0 263 427 150 4 1 95 0
0 0 18631 131 0 0 0 0 0 0 183 515 141 5 2 93 0
0 0 18632 130 0 0 0 0 0 0 171 864 187 9 3 88 0
0 0 18632 130 0 0 0 0 0 0 240 343 133 7 1 92 0
The most important information to look for and to be aware of is if there are
any blocked processes (second column), if there is much paging going on,
and if the CPU spends too much time waiting for I/O (last column) or user
processes. There should not normally be any blocked processes, there
should be no or very little paging activity and the CPU should, at least for
some periods, report some idle time (second last column).
iostat
The iostat utility reports the statistics for CPU and I/O utilization for system
device such as TTY, disks, CD, and so on. The following is an example for the
usage of iostat, where two sets of statistics are to be displayed with a one
second interval:
# iostat 1 2
12:53:21 47 10 0 42
11147 0.00 3.60 0.00
12:53:26 13 9 0 78
11147 0.00 0.20 0.00
12:53:31 7 6 0 87
11147 0.00 0.00 0.00
Average 23 8 0 69
Average 11147 0 1 0
The example above shows some configuration values and traffic statistics for
the Token-Ring interface tr0. Note that no errors are reported for this
interface.
ps
Displays statistics about the processes running in the system, such as
process ID, memory and CPU utilization. This is a very handy tool to verify
the status and resource consumption of running processes. The following
example displays an excerpt from an output of the ps command.
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov 24 - 0:11 /etc/init
ausres4 2160 3384 0 Nov 24 - 0:58 /usr/dt/bin/dtsession
root 2348 1 0 Nov 24 - 0:00 /usr/dt/bin/dtlogin -daemon
root 3200 1 0 Nov 24 - 0:00 /usr/vice/etc/afsd -nosettime
root 3384 2348 0 Nov 24 - 0:00 dtlogin <:0> -daemon
root 3750 1 0 Nov 24 - 0:23 /usr/vice/etc/afsd -nosettime
root 3888 1 0 Nov 24 - 0:00 /usr/vice/etc/afsd -nosettime
root 4148 1 0 Nov 24 - 3:02 /usr/sbin/syncd 60
root 4390 1 0 Nov 24 - 0:00 /usr/lib/errdemon
...
Other valuable information about each process can be displayed using the
many options that are supported by this utility. For example, ps gv gives you
accurate information about each process’ memory utilization.
A more detailed description of PTX is beyond the scope of this book and you
should refer to the product documentation for more information.
Other Utilities
Besides the most commonly used utilities described above, there are some
more tools that can be utilized, such as svmon for checking a real memory
usage, iptrace to collect the data between nodes on network, tprof and
netpmon to find out how much CPU time a process is using, and filemon for
monitoring file system activity.
Syntax:
ab [options] [http://]hostname[:port]/path
Options:
-n requests Number of requests to perform
-c concurrency Number of multiple requests to make
-t timelimit Seconds to max. wait for responses
-p postfile File containing data to POST
-T content-type Content-type header for POSTing
-v verbosity How much troubleshooting info to print
-V Print version number and exit
-k Use HTTP KeepAlive feature
-h Display usage information (this message)
A sample output after running this command is shown next. Note that the
example scenario chosen here involves a CGI script that, as we will explain in
10.2.3, “CGI Performance Considerations” on page 217, results in a
considerably lower response rate as compared to serving static HTML pages.
Concurrency Level: 20
Time taken for tests: 6.769 seconds
Complete requests: 200
Failed requests: 0
Total transferred: 1930708 bytes
HTML transferred: 1872286 bytes
Requests per second: 29.55
Transfer rate: 285.23 kb/s received
The most important results that webmasters are probably interested in are
the following three items shown:
Complete requests: 200
Failed requests: 0
Requests per second: 29.55
The utility can best be used to evaluate the effect of a certain configuration
change by comparing the reported numbers before and after the change.
Webmasters can vary the values for the options to impost more rules for the
data collection using the ab utility.
Note that the ab utility cannot be used to test resources that are SSL enabled
due to different protocol handshaking issues.
While tuning does increase the performance of a Web server, other methods
and plans need to be prepared if the load grows beyond expectations. The
term scalability generally addresses the issue of growing beyond the
capabilities of a single server. If a server is not powerful enough for the load it
should handle, there are basically two solutions: Upgrade the hardware to a
more powerful system (for example by adding more processors to a
multi-processor system) or add other systems and create some kind of a
single-system image (for example by adding nodes to an RS/6000 SP
system). The first solution does not require any special setup other than
some possible configuration changes to optimize performance on the new
system. For planning purposes, however, Web server hardware should be
selected so there is room for upgrading should it become necessary. The
latter solution does require some special considerations. Adding additional
server machines provides a path for almost unlimited growth, but it creates
some new obstacles to overcome. The most obvious concern is that multiple
servers will have multiple IP addresses associated with them. Thus, a user
only accesses a particular server by specifying its hostname in the URL of a
request, unless special methods are implemented that let multiple systems
appear as if they were one single system. Such methods are available that
also incorporate some means of load balancing among the servers. Another
issue is that multiple servers must be able to access the same data,
especially when Web applications are involved.
The two most commonly used methods for load balancing and single system
imaging are Round-Robin DNS (RR-DNS) and specialized vendor products,
Clients www.C
ompan
yA.com
1.1.1.1
www.CompanyA.com Round-Robin DNS
Server
1.1.1.2
yA.com
w w.C ompan
w
1.1.1.3
Clients in the upper left corner (see Figure 32) request a name resolution for
the same hostname, but get different IP addresses to spread the load among
the Web servers. In a simple configuration, the RR-DNS server will just cycle
through the list of available servers. This works fine if the requests create
about the same load on a server, and the servers have the same performance
characteristics. This is, however, not real load balancing. Most actual
Clients Requests
eNetwork Dispatcher
Manager Advisors
Load
Responses Information
Web Servers
The manager, as part of the IBM eNetwork Dispatcher product, routes the
clients’ requests to the appropriate Web server based on current load
In the following three sections, solutions are briefly introduced and explained.
Besides the performance advantage, GPFS offers means for availability, such
as data replication and self-recovery after error detection. It is, however,
limited to an RS/6000 SP system environment and cannot be used among
standalone RS/6000 systems.
The major advantages of DFS are its scalability and performance. It has a
unique protocol (on top of UDP) to manage the file and control information
exchange between servers and clients. The superior performance results
from its efficient, adaptive protocol and client-side caching. Scalability is
achieved by its distributed database that stores and manages fileset location
information. Users do not have to know the location of any file; DFS appears
to them as one single file tree, no matter how large the file space is and how
many DFS file servers there are. The DFS file tree is the same on every client
that accesses DFS. Administrators manage the file space in units (called
filesets) and assign them to individual DFS file servers. Moving filesets to
different DFS file servers can be done online and even replicating filesets for
improved availability is supported.
The IBM site in Austin, Texas, for example, runs a large DFS file space with
more than 4 TB online storage. The site Web server, based on Apache, has
access to this DFS file space, namely the user’s home directories in DFS.
Through the mechanism described in 5.3, “User Directories” on page 85,
each user can publish his/her private Web pages just by saving files in a
particular subdirectory of his/her home directory. There is no need for any
file transfer or any other administrative action. As soon as the user saves a
file, it is immediately available through the Web server to anybody. This is a
very effective way for Web publishing when multiple authors are involved,
especially when looking at the numbers. There are several thousand users
with personal home directories in DFS in the IBM Austin site.
DFS file servers and clients are available from most major vendors for a
variety of platforms. More information about DFS can be found at
http://www.openafs.org
Extending the capabilities of the IBM HTTP Server can be accomplished with
modules. This chapter provides a technical introduction to DSO modules. It
also provides examples of how to build dynamically loadable modules from
their source files. There is an exhaustive list of modules that people around
the world have written for Apache, but to use these modules with the IBM
HTTP Server, they will need to be built against the IBM HTTP Server header
files. The module-build process will be shown as a common example,
mod_info. This module extends the IBM HTTP Server such that its status can
be queried using a Web browser and the appropriate URL.
The Apache server has a module table containing hooks for the modules to
attach to. Hooks are logical representations for the events when the server
invokes program execution of the DSOs. In particular, the program code
found in the modules dedicated to performing and reacting to these
occasions, is executed. These program codes, or functions, are known as
handlers, as illustrated in Figure 34 on page 178. Thus, at compile and link
time, the handlers register themselves in the module table containing hooks
so that the handlers and the hooks build the run-time connection between the
IBM HTTP Server program and the modules. The HTTP server program then
activates the respective DSO when necessary to leave the execution to the
modules. This is all completely transparent to the clients (Web browsers and
the users).
Module
Table
Module
Handler
As explained earlier, the modules register themselves into the module table
when building (compiling/linking) the server core using a structure called
module. Each module uses this structure to register its handlers for the hooks
available in the server. As an example, the structure of the CGI module
(mod_cgi) is shown below:
The Apache server currently supports no less than 18 different hooks (as can
be seen in the module structure above) that modules can cling on in order to
add some functionality to the server.
Hooks, in general, can be grouped into two categories: those concerning the
server’s environment, called the Config Phase, and those concerning the
clients’ requests, called the Request Phase. The first six hooks discussed in
the following section deal mainly with the server configuration and module
initialization stage when the server starts up. In this stage, the server reads
the appropriate configuration file(s) before it does any client request
processing in the request phase. These configuration files include the
httpd.conf and the .htaccess files. Following that, the modules become
initialized since the server knows which of them are registered for the hooks
in the module table.
Initialization (1) – The modules that cling to this hook are invoked after the
server is configured and started in order to perform one-time setup steps of
the environment at the module initialization process.
Create Directory Config (2) – There are two occasions when this hook
invokes the modules that are registered to it. One occasion is during the
configuration process when the server reads and processes the default
setting for the main server’s directory configuration. The other occasion is
during specific directory configuration, with reference to the directives defined
in the .htaccess file or the server’s configuration file(s). In any case, if the
server finds a module’s directive defined in these configuration files, the
particular module that defines the directive is called to do the necessary
configuration on the directory specified as the argument of the directive.
Merge Directory Configs (3) – This hook takes care of conflicts between
directive usage in the parent and subdirectories. When the server hits such
conflicts, the respective module is invoked and it resolves the conflicts to
produce the most appropriate configuration for that directory. From then on,
subsequent hooks will make use of this new configuration during the client’s
request processing.
Create Server Config (4) – This hook invokes modules that perform
configuration that affects the environment of the entire server. It is used, for
Merge Server Configs (5) – Like the Directory Merger hook, this hook also
fine-tunes and resolves any conflicts between servers.
Commands Table (6) – This hook points to a list of directives and their
respective attributes defined in the modules. These attributes include the
syntax, their default values, context, override flag, status, and so on. All this
information is checked against those made in the configuration files during
the configuration-reading process.
Fixups #13
Logging #14
Content Handling (7) – This hook points to a table containing the name and
function of each handler, such that the server knows who to locate if the need
Translate Path (8) – During the translation phase, the server calls any
module registered for this hook in order to allow them to translate the URL
into a filename. Once a translation is done, the server suppresses the rest of
the requesting modules to prevent further redundant translation. However, if
no module is interested in doing the translation, the core server translates it
with reference to the DocumentRoot directive defined in the server
configuration file(s).
Verify User ID (9) – This hook comes after the access checking phase (see
#11 below) and checks the credentials of the users such as the user ID and
password, against the authorization database defined in the server. The
server stops processing other modules on this hook as soon as one module
completes the validation. On the other hand, if no module performs this task,
the server aborts the request with an error message sent to the client.
Verify User Access (10) – This is the last phase for any security verification
to be done before the client’s request is finally accepted. After knowing who is
requesting, the server moves on to check whether the client has the access
rights to obtain the requested document by comparing the credentials
collected in the User Identification phase with the Require directive defined,
for instance, in the .htaccess file in the specific directory (see also 6.2, “Basic
Authentication” on page 118). Here again, there should be at least one
module performing the validation, otherwise, the request is aborted and an
error is returned.
Check Access (11) – This is the first phase where no credentials from the
client are requested, but rather, based on information like the client’s IP
address, the server invokes modules to do a basic check of the client. The
server returns a permission denial error message if any of the modules
opposed the rights for this client.
Check Type (12) – After all the validation and verification of the client, the
eligible client’s request is passed on to the modules to determine the type of
document requested. Thereafter, the one module that has completed the
determination of the document type informs the core server so that the server
can inform the client (Web browser) to act accordingly, for example to ask the
user whether to save a file to disk or open for display.
Fixups (13) – Before the server processes the document and returns it to the
client, the server offers this hook for any modules that wish to perform some
Logging (14) – At this point, the client’s request has already been handled,
but this hook allows any modules to capture the events that happened
throughout the request parsing process for logging or future references.
Header Parse (15) – The server invokes the modules registered to this hook
to do a basic check at this early stage based on the request headers and
translated filename. There is no standard module defined for this hook.
Child Init (16) – Modules registered for this hook are being called whenever a
new child process is being spawned by the main process.
Child Exit (17) – This hook informs the registered modules before a child
process terminates for them to perform necessary actions.
Post-Read Request (18) – This is the very first phase that checks the clients’
requests after reading the request headers. This hook permits the invoked
modules to make necessary decisions based on the raw request, but forbids
them to make any modification at this point.
Table 15 lists the hooks and modules that apply to the request phase.
Table 15. Module Matrix, Request Phase
Hook Modules
Content Handling mod_actions, mod_asis, mod_autoindex, mod_cgi, mod_dir,
(#7) mod_imap, mod_include, mod_info, mod_negotiation,
mod_rewrite, and mod_status.
Translate Path (#8) mod_alias, mod_rewrite, and mod_userdir.
Verify User ID (#9) mod_auth, mod_auth_anon, mod_auth_db, mod_auth_dbm,
and mod_digest.
Verify User Access mod_auth, mod_auth_anon, mod_auth_db, mod_auth_dbm,
(#10) and mod_digest.
Check Access (#11) mod_access
Check Type (#12) mod_mime, mod_mime_magic, mod_negotiation, and
mod_rewrite.
Fixups (#13) mod_alias, mod_cern_meta, mod_env, mod_expires,
mod_headers, mod_negotiation, mod_rewrite, mod_speling,
and mod_usertrack.
Logging (#14) mod_log_agent, mod_log_config, and mod_log_referer.
Header Parse (#15) –
Child Init (#16) mod_rewrite and mod_unique_id.
Child Exit (#17) mod_log_config.
Post-Read Request mod_setenvif and mod_unique_id.
(#18)
Once mod_info is imbedded in the IBM HTTP Server, the information page,
as shown in Figure 36 on page 186, can simply be displayed by appending
server-info the Web server’s root URL. Note that this may expose information
about the Web server that is not meant to be available to anybody. Thus,
mod_info is not normally available on production Web servers.
In order to build the mod_info module (or any other modules), you will need to
have an ANSI C compiler installed on the system.
ClearModuleList
# Other AddModule lines here
AddModule mod_info.c
14.If your server is already running, then you can restart it with the command:
# apachectl restart
If the IBM HTTP Server is not running, you can start it with the command:
# apachectl start
(The apachectl executable is located in /usr/lpp/HTTPServer/sbin.)
15.If access has been granted to the client you are using with the above
Allow directive, then you should then be able to request the URL
http://<your server>/server-info. Make sure to replace <your server> with
the hostname of your Web server.
This chapter provides guidelines for users who would like to migrate from
their current Web server to the IBM HTTP Server. The descriptions that follow
cover the migration from the IBM Internet Connection Secure Server (ICSS),
Lotus Domino Go, and Netscape Communication Corporation’s FastTrack and
Enterprise Servers. Migration from other Web server products might be
similar to those mentioned above. It is, however, almost impossible to explain
every aspect of a migration, and thus, only the commonly used features are
described in detail later.
Before we go into the details in the subsequent sections, some of the most
obvious differences shall be listed here. Almost all commercial Web servers,
including the IBM Internet Connection Secure Server, Lotus Domino Go
Webserver, Netscape FastTrack and Netscape Enterprise Servers, have
remote administration features with graphical user interfaces. For the IBM
HTTP Server, such a feature might be available in a future release. A proxy
function is available only as an add-on module to the IBM HTTP Server that is
not included in the standard package. The Lotus Domino Go Webserver has
powerful Java-based log reporting tools, a Java Servlet engine, SNMP
management, a text search engine, multithreading, multiple language
support, and more. These features are not (yet) included in the current
version of IBM HTTP Server.
The main advantages of the IBM HTTP Server in comparison with other
commercial Web servers are its modular structure, its industry standard open
architecture (including the availability of source code), its flexible
configuration, and numerous skilled specialists. Also, the IBM HTTP Server
supports virtual hosts, which Lotus Domino Go and ICSS do not.
9.1.1 Installation
You can install and run the IBM HTTP Server on the same machine that is
running the IBM ICSS or Lotus DGW, as long as you keep them on different
IP ports. For example, you can run Lotus DGW on the default port 80 and, at
the same time, evaluate the IBM HTTP Server on port 8080. After completion
The most common configuration directives are almost the same for the
subject Web servers. The following table, Table 17, lists the most important of
these directives.
Table 17. Basic Directives (Comparison)
Log file for request logging (this feature is optional for TransferLog AccessLog
the IBM HTTP Server)
Default file name for requests that include only a DirectoryIndex Welcome
directory part in the URL (this is an optional feature of
the IBM HTTP Server)
More details about each of the directive mentioned above can be found in the
documentation of the corresponding Web server, or at the appropriate places
in this redbook.
The IBM HTTP Server module mod_rewrite allows you to migrate the Map
and Fail directives. The usage of module mod_rewrite is quite complex and
you should use it with caution. Below are some simple examples of how to
migrate Map and Fail directives.
The proxy module is not included in current version of IBM HTTP Server (see
3.1, “Product Contents” on page 33) so we do not describe migration of the
proxy functionality.
The IBM HTTP Server has more advanced virtual hosting features than the
IBM ICSS/Lotus DGW Server, such as separate log files, access restrictions,
and error messages for each virtual host. You might want to take advantage
of them when migrating.
The access-control features are quite similar among all subject Web servers,
but the syntax is so different that it is very difficult to do corresponding
configuration.
The following example will give you some ideas about how to migrate
authentication and access-control configuration. The IBM HTTP Server
modules mod_auth and mod_access are required for that configuration.
Take into account that password and group file formats are incompatible
between the Web servers and need to be migrated or recreated manually.
The following example shows how to create custom report files that are
equivalent to those created by the IBM ICSS/Lotus GWS reference and agent
log files.
CustomLog /webserver/logs/referer_log "%t \"%{Referer}i\""
CustomLog /webserver/logs/agent_log "%t \"%{User-agent}i\""
FastCGI scripts are also supported through the optional module mod_fastcgi
(see 10.2.3, “CGI Performance Considerations” on page 217).
The IBM HTTP Server and the IBM ICSS/Lotus GWS support the same
syntax for server-side includes. The feature sets differ only very little so you
should not have serious problems during migration.
This is an excerpt of an original file mapfile.txt for the IBM ICSS/Lotus DGW:
default http://www.CompanyA.com
rectangle (50,25) (80,75) http://www.CompanyA.com/sales
circle (100,130) 20 http://www.CompanyA.com/products
polygon (10,10) (40,20) (5,15) http://www.CompanyA.com/support
This is the edited and renamed file mapfile.map for the IBM HTTP Server:
default http://www.CompanyA.com
rect http://www.CompanyA.com/sales 50,25 80,75
circle http://www.CompanyA.com/products 100,130 100,150
poly http://www.CompanyA.com/support 10,10 40,20 5,15
9.2.1 Installation
You can install and run the IBM HTTP Server on the same machine that is
running a Netscape Web server as long as you keep them on different IP
ports. For example, you can run a Netscape Web server on the default port 80
and, at the same time, evaluate and configure the IBM HTTP Server on port
8080. After some time, you can sunset the Netscape Web server and, at the
same time, switch the IBM HTTP Server to the default port.
Note: The <ns-home> directory listed in the right column of Table 18 above can
be chosen upon installation. By default, it is either /usr/netscape/suitespot or
/usr/ns-home.
Log file for requests logging (this feature is optional in IBM TransferLog access
HTTP Server) (obj.conf)
Default file name for requests that include only a directory DirectoryIndex fn=find-index
part in the URL (this is optional feature of IBM HTTP (obj.conf)
Server)
Table 19 on page 200 shows some basic directives found in both servers,
mainly in the httpd.conf (IBM HTTP Server) and magnus.conf (Netscape Web
servers). It can be seen that the Netscape Web servers may not have some
directives defined, but it does cater the option for other representation in files
like obj.conf and .nsconfig.
The list below illustrates the list of directives supported in the .nsconfig file:
AddType Assigns encoding to file extensions.
ErrorFile Assigns error messages other than default.
RequireAuth Performs user authentication using a userfile.
RestrictAccess Applies access control to resources.
This script converts all .nsconfig files to .htaccess files. Perl must also be
installed in order to run this. For more information, please refer to the
administrator guides or the documentation for the respective Netscape Web
server.
The IBM HTTP Server uses the Alias directive to perform the same function:
Alias /admin /webserver/admin
This is how it is implemented in the IBM HTTP Server using the Redirect
directive:
Redirect /other http://www.other.org
Redirect /other2 http://www.other.org/test/other2
<Object name="cgi">
ObjectType fn="force-type" type="magnus-internal/cgi"
Service fn="send-cgi"
</Object>
The IBM HTTP Server uses the ScriptAlias directive for the equivalent
implementation in the httpd.conf file as shown below:
The IBM HTTP Server offers more flexibility in defining any file types to be
CGI programs, as compared to the Netscape Web servers that restrict it to
only the three file types mentioned above. However, webmasters are
reminded of the effects of specifying file types as CGI programs, such as .exe
file extensions, that might be provided for download, rather than execution.
Also any non-CGI files with those extensions will result in errors when the
server processes them as CGI programs.
The IBM HTTP Server uses a directive called AddHandler in the httpd.conf file
for the specification of file types with extensions .cgi to be run as CGI
programs. Ensure that the execution of CGI programs are enabled using the
Options directive, as shown below with the AddHandler directive:
Options ExecCGI
AddHandler cgi-script cgi
The equivalent notation for hardware virtual servers in the IBM HTTP Server
is IP-based virtual hosts (see also 5.1, “Virtual Hosts” on page 71) and the
following clip from the configuration file shows how it is implemented:
<VirtualHost www.CompanyA.com>
ServerName www.CompanyA.com
DocumentRoot /webserver/CompanyA/
</VirtualHost>
The equivalent notation for software virtual servers in the IBM HTTP Server is
name-based virtual hosts. This is how it is implemented in the httpd.conf file:
NameVirtualHost 1.2.3.4
<VirtualHost 1.2.3.4>
ServerName www.CompanyA.com
DocumentRoot /webserver/CompanyA/
</VirtualHost>
<VirtualHost 1.2.3.4>
ServerName www.CompanyC.com
DocumentRoot /webserver/CompanyC/
</VirtualHost>
The similar configurations required in the IBM HTTP Server httpd.conf file are
as follows:
<VirtualHost 1.2.3.4>
...
ServerName www.CompanyA.com
ErrorLog /webserver/CompanyA/logs/error_log
TransferLog /webserver/CompanyA/logs/access_log
...
</VirtualHost>
<VirtualHost 1.2.3.5>
...
ServerName www.CompanyC.com
...
ErrorLog /webserver/CompanyC/logs/error_log
TransferLog /webserver/CompanyC/logs/access_log
...
</VirtualHost>
The main consideration here for the IBM HTTP Server is whether or not the
virtual hosts share the same or different configuration files. The IBM HTTP
Server offers the flexibility to share configuration files based on either one or
multiple IP addresses. The webmaster needs to put the configuration
parameters inside the <VirtualHost> directive to enforce specific actions on
that particular host, or place them outside that directive to apply these
configurations to all the hosts defined.
For instance, this is how to configure two hosts to share common log files:
...
ErrorLog /webserver/logs/error_log
TransferLog /webserver/logs/access_log
<VirtualHost 1.2.3.5>
...
ServerName www.CompanyC.com
...
</VirtualHost>
If the configuration was created exclusively using the administration GUI, the
generated ACL files are different from any files supported by the IBM HTTP
Server, and thus, migration does require some individual work.
The .nsconfig file contains similar configurations as the .htaccess file used in the
IBM HTTP Server. The following shows a fragment of the .nsconfig file used
by the Netscape Web servers:
<Files /staff/*>
RestrictAccess method="(GET|POST)"type="deny"
RestrictAccess method="(GET|POST)"type="allow" ip="1.2.3.*"
RestrictAccess method="(PUT)"type="deny"
RestrictAccess method="(PUT)"type="allow" ip="*"
RequireAuth file=/webserver/security/users2 realm=StaffOnly
userlist=valid-user
</Files>
The equivalent version of the .htaccess on the IBM HTTP Server can be
implemented with the help of the modules mod_auth and mod_access. The
following is a sample of an .htaccess file enforcing access control using the
equivalent directives as above:
<Location /staff>
AuthName StaffOnly
AuthType Basic
AuthUserFile /webserver/security/users2
AuthGroupFile /webserver/security/groups2
<Limit GET POST>
Order deny,allow
Deny from all
Allow from 1.2.3
require valid-user
Satisfy all
</Limit>
<Limit PUT>
Order deny,allow
Allow from all
require group management accounting
</Limit>
</Location>
When either the .nsconfig or the .htaccess files are used in the Netscape Web
servers, the userfile and the groupfile are compatible to the ones used by the
IBM HTTP Server, provided that they are of the following simple form:
Userfile format:
username:password
username:password
...
Groupfile format:
The access log files of the Netscape Web servers can be configured in three
formats: common logfile format, flexible log format, or customizable format.
Generally, the configuration settings of any of these are found in the obj.conf
file. The following is an example of the default settings for access logging of
the Netscape Web servers:
Init fn="flex-init" access="/nscape/suitespot/httpd-CompanyA/logs/access"
format.access="%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%]
\"%Req->reqpb.clf-request%\" %Req->srvhdrs.clf-status%
%Req->srvhdrs.content-length%"
The following shows the default settings present in the httpd.conf file of the
IBM HTTP Server:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog /webserver/log/access_log common
For the Netscape Web servers, there is a log-analyzer tool to analyze the log
data collected. As for IBM HTTP Server, log files can be configured to be
handled by any other third party utilities for analysis. Such utilities are
available on the Web (see also 5.7.3, “Customizing the Log Format” on page
101).The mod_log_config module supports the TransferLog directive for the
creation of log files that adhere to the Common Log Format (CLF) standard,
Or do it directly as shown:
CustomLog /<log dir>/agent_log "%t \"%{User-agent}i\""
CustomLog /<log dir>/referer_log "%t \"%{Referer}i\""
The IBM HTTP Server also supports the creation of separate log files for
each virtual host defined on the system. Likewise, it is not difficult to locate
these log files on the Netscape Web servers since each host is defined under
a separate directory tree beneath the admin server, and their respective
obj.conf file states the rules and location of these log files.
FastCGI scripts are also supported by the optional module mod_fastcgi (see
also 10.2.3, “CGI Performance Considerations” on page 217).
As for the IBM HTTP Server, image maps are implemented using the
mod_imap module. To enable image maps, use the AddHandler imap-file
map directive. For more information about image mapping in the IBM HTTP
Server, see 10.5, “Image Maps” on page 220.
The Web has quickly evolved to an instrument where not only static HTML
pages are available from their providers. Business applications, such as a
classic merchandise catalog and ordering application, presented very
attractive business opportunities for most companies.
10.1 Concepts
Each day the Internet becomes more and more business-oriented. To use the
World Wide Web for business, static HTML pages are not enough. Special
applications are required to process users’ input and integrate the Web
server with other information systems. Such programs that extend the Web
beyond passive content browsing are called Web applications.
Usually, Web applications process user data supplied by HTML forms (see
the example in the following figure, Figure 37 on page 214). Forms can
contain entry fields, selection lists, check boxes and other controls. Each form
has a specific URL of an application that handles the form data. After filling in
the form, a user is typically required to click on a submit button on the form.
At that time, the browser sends the form data in a well defined format to the
Web server. The server subsequently decodes the data and passes it on to a
specified program for processing. The application may respond with an HTML
page that was specifically constructed for that particular instance.
The basic principle of CGI is that a Web server passes client request
information to CGI programs in system environment variables (and in some
cases through standard input or command line arguments) and all standard
output of CGI programs is returned to Web clients. This allows for easy
writing of Web applications in almost any programming language, but has
some performance and security drawbacks (see 10.2.3, “CGI Performance
Considerations” on page 217 and 10.2.4, “CGI Security” on page 218).
Another deficiency to overcome (if necessary) is that the HTTP protocol itself
is stateless and any application that requires more than one step to complete
a task needs to send all related information in each step to keep track of the
steps.
Although widely used in many Web applications, at the time of writing this
book there was no official CGI standard available. Two commonly referenced
If you want to declare that all files in a particular directory are CGI programs,
use the ScriptAlias directive, as in the following example:
ScriptAlias /cgi-bin/ /usr/lpp/HTTPServer/share/cgi-bin/
If you would like to associate a filename extension with CGI programs, use
the AddHandler directive. For example:
AddHandler cgi-script cgi
This directive instructs the Web server to treat each file with the extension
.cgi as a CGI program and it will, therefore, attempt to execute it rather than
simply read and send it back to the browser as it would with an ordinary
HTML file.
The ExecCGI option can be used anywhere in the main configuration file and
also in .htaccess files.
If you are uncertain about the variables and/or their values, you can replace
your CGI program temporarily with this Perl script to examine the variable in a
certain context. Perl needs to be installed on your system in the /usr/local/bin
directory in order to make this script work. If you do not have Perl installed,
the following Korn shell script does almost the same thing:
#!/usr/bin/ksh
One of the security precautions is to run all CGI programs under some
special user other than root. This user is specified in the configuration file by
the User directive. By default, it is set to user nobody, but it is better to create
a special user for this purpose (see also 3.6, “Initial Setup” on page 42).
10.3 Modules
Modules are a very powerful way to create Web applications, though not
necessarily easy. Modules can influent client request processing in almost
any step. The IBM HTTP Server has special C language APIs (Application
Programming Interface) for additional modules. A comprehensive API
description can be found at http://www.apache.org/docs/misc/API.html. Also,
Chapter 8, “Building HTTP Server Modules” on page 177 gives you an
introduction to writing modules.
Because of its complexity, API programs (modules) are not widely used to
create Web applications directly. A more common way is to write modules
that provide their own, simpler, platform-independent and task oriented API.
Examples of such modules are the Perl language interpretation module
mod_perl (see 10.6.3, “Perl” on page 222) or the IBM WebSphere Application
SSI commands control the inclusion of other files into HTML files,
conditionally remove parts of HTML files, and even execute external
programs. For example, the following SSI instruction will be replaced with the
content of the file header.html before the HTML document is sent to the
client:
<!-- #include virtual="header.html" -->
These examples certainly do not show all the powerful capabilities of SSI.
More SSI features can be found in the documentation for the mod_include
module.
Then, the SSI feature can be enabled by the following directive in the
corresponding section of the configuration file or the .htaccess file (more
about the Options directive can be found in 4.10, “Options” on page 67):
Options +Includes
With this, every file within the scope of the definition with an extension of
.shtml will be processed by the SSI module before sending it to the Web
client.
Note that the Options All directive allows SSI processing, including shell
commands and CGI programs.
Server-side image maps have existed since HTML Version 2. In this method,
pixel coordinates of a mouse click (in relation to an image) are sent to the
Web server and resolved to a URL there. The IBM HTTP Server uses the
standard module mod_imap to handle server-side image maps.
Here, map is a file extension of image map files. The contents of this file
describe regions on an image, related URLs and other instructions. Regions
can have rectangle, circle, polygon or point shapes. The closest point region
is used when no other regions are satisfied. A default URL can be specified,
which will be used in case the given coordinates do not fit to any specified
region. The origin of coordinates (x, y) is the upper-left corner of an image.
Let’s look at an example. The image map file sample.map in the document
root directory of the Web server www.CompanyA.com contains:
default /help/images.html
rect /products.html 50,25 80,75
poly http://www.CompanyB.com/info 10,10 40,20 5,15
circle mailto:[email protected] 100,130 100,150
The current HTML document on the Web browser contains the following
fragment, displaying some sort of a graphic:
<A HREF="/sample.map">
<IMG ISMAP SRC="/menu.gif"></A>
If the user clicks on the coordinates 60,50 within that graphic, his request for
http://www.CompanyA.com/sample.map?60,50 will be redirected to the URL
http://www.CompanyA.com/products.html.
Of course, there are more image map configuration possibilities. For a more
detailed description, we refer you to the original mod_imap module
documentation.
10.6.1 C
C is a general-purpose programming language. It can be used for writing CGI
programs and IBM HTTP Server modules. C programming requires higher
programming skills than script programming. Generally, it takes more time to
develop and debug programs in C as compared to scripting languages. On
the other hand, C is more suitable for big and complex projects. The whole
IBM HTTP Server and the additional modules are written in C. Performance
of compiled C programs is usually much better than scripts.
10.6.3 Perl
Probably the most popular script language for Web application development
is Perl. Besides the usual advantages of interpreted languages such as fast
and easy programming, Perl has powerful textual data manipulation
capabilities and extensible features. There are many Perl-extension libraries
available on the Internet.
To improve Perl script performance on the IBM HTTP Server, the mod_perl
module was developed, which has a built-in Perl language interpreter.
More information about the Perl language, extensions and the mod_perl
module can be found at http://perl.apache.org, or http://www.perl.com.
10.6.5 PHP
PHP (Hypertext Preprocessor) is an HTML-embedded scripting language
similar to server-side includes, but much more powerful. It allows quick and
easy development of dynamic HTML pages. In general, PHP has better
performance than CGI scripts.
10.6.6 REXX
REXX is a procedural programming language created for IBM mainframe
computers and later ported to other platforms. It is quite popular in IBM-
related environments. Similar to other interpreted languages, REXX
programming is relatively easy and performance is average.
The Java servlet engine is implemented in the IBM HTTP Server as the
dynamically loaded module mod_app_server. This ensures better
performance compared to the CGI approach and allows stronger control of
client request processing.
This publication is intended to help professionals who need to plan for and
implement the IBM HTTP Server on RS/6000 based on the Apache server.
The information in this publication is not intended as the specification of any
programming interfaces that are provided by the WebSphere or the Apache
product. See the PUBLICATIONS section of the IBM Programming
Announcement for the IBM WebSphere Application Server V2.0 product for
more information about what publications are considered to be product
documentation.
IBM may have patents or pending patent applications covering subject matter
in this document. The furnishing of this document does not give you any
license to these patents. You can send license inquiries, in writing, to the IBM
Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood,
NY 10594 USA.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact IBM
Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
("vendor") products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of
these Web sites.
Reference to PTF numbers that have not been released through the normal
distribution process does not imply general availability. The purpose of
including these reference numbers is to alert IBM customers to specific
information relative to the implementation of the PTF when it becomes
available to each customer according to the normal IBM PTF distribution
process.
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States and/or other
countries.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks or
registered trademarks of Microsoft Corporation.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
A good source for Requests for Comments (RFCs) can be found at:
http://www.isi.edu/rfc-editor/rfc.html
This section explains how both customers and IBM employees can find out about ITSO redbooks,
redpieces, and CD-ROMs. A form for ordering books and CD-ROMs by fax or e-mail is also provided.
• Redbooks Web Site http://www.redbooks.ibm.com/
Search for, view, download or order hardcopy/CD-ROM redbooks from the redbooks web site. Also
read redpieces and download additional materials (code samples or diskette/CD-ROM images) from
this redbooks site.
Redpieces are redbooks in progress; not all redbooks become redpieces and sometimes just a few
chapters will be published this way. The intent is to get the information out much quicker than the
formal publishing process allows.
• E-mail Orders
Send orders via e-mail including information from the redbooks fax order form to:
e-mail address
In United States [email protected]
Outside North America Contact information is in the “How to Order” section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
• Telephone Orders
United States (toll free) 1-800-879-2755
Canada (toll free) 1-800-IBM-4YOU
Outside North America Country coordinator phone number is in the “How to Order”
section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
• Fax Orders
United States (toll free) 1-800-445-9269
Canada 1-403-267-4455
Outside North America Fax phone number is in the “How to Order” section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/
This information was current at the time of publication, but is continually subject to change. The latest
information for customer may be found at http://www.redbooks.ibm.com/ and for IBM employees at
http://w3.itso.ibm.com/.
Company
Address
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
235
FrontPage (Microsoft) 96 icons 38
FTP 95 ICSS 191
future of Apache 8 Ident daemon 98
IdentityCheck 98
IETF 97, 129, 130
G ifconfig 75
General Parallel File System (GPFS) 174
IfDefine 62
GIF images 38
IfModule 62
graphic images 13
ikeyman 39, 136
Group 43, 56
image maps 197, 220
gskre301 33
IncludesNOEXEC 86
gskrf301 33
index.html 13, 80
gskru301 33
IndexIgnore 83
GUI (future) 9
indexing (directory) 80
gunzip 187
IndexOptions 83, 84
gzip 125
FancyIndexing 81
ScanHTMLTitles 84
H SuppressHTMLPreamble 83
handlers 177 inetd 54, 55
type-map 87 init 52
handshake (SSL) 130 inittab 52
hash 117 installation 39
HeaderName 82 filesets 33
history of Apache 1 prerequisites 36
hooks 177 installp image 39
host command 75 Internet 1, 111
HostnameLookups 97, 115, 162 Internet Architecture Board (IAB) 130
htconvert 202 Internet Explorer (Microsoft) 78, 90
htdigest 38, 128 Internet Service Provider (ISP) 72, 113
htdocs 38 intranet 111
HTML forms 213 iostat 166
htpasswd 38, 124, 195 IP-based virtual hosts 72
HTTP 115 iptrace 169
HTTP Engine 31 ISAPI 3
HTTP server 2 ISO 111
HTTP/1.0 118, 228
HTTP/1.1 3, 12, 72, 74, 78, 127, 160, 228
http_server.base 33 J
JAVA_HOME 136
http_server.modules 33
http-analyze 103
httpd 38 K
HTTPd (NCSA) 1 KeepAlive 160
httpd.conf 6, 38, 55, 57, 116, 143, 158, 179 KeepAliveTimeout 161
httpd.conf.sample.ssl 38, 146 Kerberos 126
hypertext 1 key database 136
key management 112
Korn shell 216
I
IBM HTTP Server (introduction) 6
237
netstat 168 R
network I/O 158 rc.httpd 52
network interface 72, 75 rc.shutdown 53
NFS 86 RC2 135
nobody user account 42, 150 RC4 135
nonce 128 README file 38
non-repudiation 112 Readme.httpserver 37
not found (error) 92 ReadmeName 82
ns-admin.conf 199 recommended directory structure 49
nslookup command 75 redirection 202
request phase (hooks) 179
Require 120, 122
O restarting (the IBM HTTP Server) 54
obj.conf 200
ObjectType 211 retina 114
Options 163 RFC 228
ExecCGI 86 1945 92
Includes 86 2068 92
IncludesNOEXEC 86, 150 2291 97
Indexes 80, 85 931 98
MultiViews 89 draft for CGI 215
options 67 RLimitCPU 162
OS/2 Warp 3 RLimitMEM 162
RLimitNPROC 162
robot 100
P robots.txt 100
packaging 33 root authority 42
paging space 157 rotatelogs 101, 103
PassEnv 197, 210 RSA 135
Perl 95, 125, 216, 222
persistent connections 12
PHP 223 S
PidFile 50 sar 167
ping 75 Satisfy 122
PKCS10 134 scalability 171
Port 55 scope 61, 116
POST method 95 Script 96
preface xiii ScriptAlias 203, 215, 218
prerequisites 36 sections 61
privacy 112, 130 parsing rules 64
Private Communication Technology (PCT) 129 security 111
processes (main, children) 43 .htaccess file 123
product packaging 33 authorization 115
proxy 106 basic authentication 118
proxy client configuration 108 basic elements 111
proxy server products 108 CGI programs 218
ps 43, 44, 168 cryptography 117
PTX (Performance Toolbox) 168 digest authentication 127
public-key cryptography 117 logical 113
PUT method 95 physical 112
T
TCP/IP setup 75
239
240 IBM HTTP Server Powered by Apache on RS/6000
ITSO Redbook Evaluation
IBM HTTP Server Powered by Apache on RS/6000
SG24-5132-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete
this questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to [email protected]
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Was this redbook published in time for your needs? Yes___ No___