The Apache Modelling Project
The Apache Modelling Project
HTTP
The Apache
- grc.rst.(USR1) later (HUP/USR1)
Files
Documents
Modeling
Child Server 1
Admin
Project
Child Server 2
Master
Scripts
Server
Files
local
global config. data
config. Child Server N (.htacess)
data
Bernhard Gröne
Andreas Knöpfel
Scoreboard
server status generation ... Rudolf Kugel
shared
generation
Olivermemory
Schmidt
FMC
January 24, 2008
ii FMC
FMC
http://www.fmc-modeling.org
This document can be found at the Fundamental Modeling Concepts Project web site:
HTML: http://www.fmc-modeling.org/projects/apache
PDF: http://www.fmc-modeling.org/download/projects/apache/the_apache_modelling_project.pdf
Thanks to:
Robert Mitschke and Ralf Schliehe–Diecks for preparing and editing material for this docu-
ment.
Copyright (c) 2002–2004 Bernhard Gröne, Andreas Knöpfel, Rudolf Kugel, Oliver Schmidt.
Permission is granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License, Version 1.2 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in appendix D entitled "GNU Free Documentation
License".
Abstract
This document presents an introduction to the Apache HTTP Server, covering both an
overview and implementation details. It presents results of the Apache Modelling Project
done by research assistants and students of the Hasso–Plattner–Institute in 2001, 2002 and
2003. The Apache HTTP Server was used to introduce students to the application of the
modeling technique FMC, a method that supports transporting knowledge about complex
systems in the domain of information processing (software and hardware as well).
After an introduction to HTTP servers in general, we will focus on protocols and web tech-
nology. Then we will discuss Apache, its operational environment and its extension capabil-
ities — the module API. Finally we will guide the reader through parts of the Apache source
code and explain the most important pieces.
Contents
1 Introduction 1
1.1 About this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The FMC Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 The modeling project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Sources of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 HTTP Servers 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Tasks of an HTTP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Protocols and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 RFCs and other standardization documents . . . . . . . . . . . . . . . . 7
2.3.2 TCP/IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Domain Name Service (DNS) . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.4 HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Access Control and Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Authentication methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 HTTPS and SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Session Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 HTTP — a stateless protocol . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.2 Keep the state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Dynamic Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.1 Server–side Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
ii
FMC
CONTENTS iii
4 Inside Apache 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Structure of the Source Distribution . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Apache 1.3.17 source distribution . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Apache 2.0.45 source distribution . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Multitasking server architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Inetd: A common multitasking architecture . . . . . . . . . . . . . . . 56
4.3.2 Overview — Apache Multitasking Architectures . . . . . . . . . . . . . 58
4.3.3 The Preforking Multiprocessing Architecture . . . . . . . . . . . . . . 58
4.3.4 Apache Multitasking Architectures and MPMs . . . . . . . . . . . . . . 73
4.3.5 Win32/WinNT MPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.6 Worker MPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.7 Others MPMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 The Request–Response Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.2 Waiting for connection requests . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.3 Waiting for and reading HTTP requests . . . . . . . . . . . . . . . . . . 82
4.4.4 Process HTTP Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5 The Configuration Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.1 Where and when Apache reads configuration . . . . . . . . . . . . . . 85
4.5.2 Internal data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.3 Processing global configuration data at start–up . . . . . . . . . . . . . 89
4.5.4 Processing configuration data on requests . . . . . . . . . . . . . . . . 94
iv CONTENTS FMC
B Sources 118
B.1 Simple HTTP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
FMC
CONTENTS v
Bibliography 135
Glossary 136
Index 139
List of Figures
4.1 Directory structure of the Apache HTTP Server 1.3.17 source distribution . . . 54
4.2 Directory structure of the Apache HTTP Server 2.0.45 source distribution . . . 55
4.3 Multiprocessing Architecture of an inetd server . . . . . . . . . . . . . . . . . . 56
4.4 Behavior of a multiprocessing server . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 The leader–followers pattern used in the preforking server architecture . . . . 59
4.6 The Apache 2.0 Preforking MPM . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7 Overview: The behavior of Apache . . . . . . . . . . . . . . . . . . . . . . . . 61
4.8 Details of the behavior of Apache . . . . . . . . . . . . . . . . . . . . . . . . . 63
vi
FMC
LIST OF FIGURES vii
Introduction
1
2 CHAPTER 1. INTRODUCTION FMC
A model is an abstraction of the software system that helps us to think and talk about it.
Everyone dealing with a software systems forms a unique model of the system in his or her
mind. Division of labor works best if those models resemble each other.
The Fundamental Modeling Concepts (FMC) help in transporting a model of a software
system from one human being to another. That is the primary objective. Visit our web site
at http://www.fmc-modeling.org to learn more about FMC.
The project was done at the Hasso–Plattner–Institute for Software Systems Engineering. The
idea behind the project was to show students a way to master the complexity of a real soft-
ware product. Apache is a HTTP Server developed by many volunteers and used by many
companies all over the world. Most important, its source code is available. Therefore it is
an interesting object for a modeling project. We soon learned that a lot of information about
configuration and administration of Apache exists, but little about implementation details
aside from the source code.
More details of the seminars of 2001 and 2002 can be found at the Apache Modeling Portal
(http://www.fmc-modeling.org/projects/apache [3])
A result of the project was a set of models describing various aspects of Apache and its
environment which formed the basis of this document.
The first task in the modeling project was to find sources of information about Apache. Start-
ing from the Apache HTTP Server Project Web site (http://httpd.apache.org [2]), it is easy
to find information about usage and administration of Apache; take for example “Apache
Desktop Reference” by Ralf S. Engelschall [1]. Finding information about the implementa-
tion of Apache was much harder. A very good source of information was “Writing Apache
Modules with Perl and C” by Lincoln Stein & Doug MacEachern [4]. This book describes
the Module API of Apache and provides the information needed to create new modules.
Therefore it contains a description of the Apache API and the Request–Response–Loop (see
section 4.4).
The most important source of information was the source code distribution of Apache itself.
It contains documentation of various details and lots of comments in the source code. One
problem lies in the fact that the source code distribution provides one source base for many
system platforms. In the Apache 1.3.x source, big parts of code are only valid for one plat-
form (using the #ifdef preprocessor directive). For Apache 1.3.x, we only looked at the
code valid for the linux platform. In 2002, we examined the Apache 2.0 source code which
provides a much better structure. The flexibility of the new Apache holds other obstacles
for the reader of its code: For example, the hook mechanism provides an extensible indirect
procedure call mechanism using an elegant macro mechanism. Without knowing this mech-
anism, you won’t find out which handlers will be registered for a hook (see section 3.3.2 for
details).
FMC
1.4. SOURCES OF INFORMATION 3
The only tool used for the analysis of the source code transformed the C source code into
a set of syntax highlighted and hyperlinked HTML files. They are available at our project
site1 .
1
Start with “Structure of the Apache HTTP Server source distribution”:
sources/
Chapter 2
HTTP Servers
2.1 Introduction
If you want to understand a software product, it is a good idea to know exactly what it is
supposed to do. In case of a server, you need information about the clients, resources and
protocols, too. As the server is part of a system, it is necessary to examine and understand
the system.
As a consequence, we will first give an introduction to HTTP servers in general, including
protocols like HTTP, TCP/IP and finally discussing subjects like dynamic content and web
applications.
Section 2.2 gives an overview of the system consisting of the HTTP Server, clients and re-
sources. It presents a model of a simple HTTP server that could be implemented in a small
exercise. Section 2.3 deals with protocols and standards related to the HTTP server. The
most important protocol is HTTP. All further sections describe various aspects of the sys-
tem like authentication, security or using HTTP for web applications. They just provide an
overview as there are enough documents going into detail.
4
FMC
2.2. TASKS OF AN HTTP SERVER 5
R R R
HTTP establish
connection
send GET request
HTTP Server
process request
Files
send resource
close connection
Editor Editor
process resource
required resources
yes missing?
no
render page
On the right–hand side, figure 2.1 also depicts what happens in the system: The user enters
a URL in the browser or clicks on a hyperlink. The Web Browser extracts the server address
from the URL or link and establishes a TCP/IP connection with the server. Using the con-
nection, the browser sends a GET request to the HTTP server. It extracts the resource name
to be requested from the URL or the link.
The HTTP Server reads the request and processes it (i.e. translates it to a file name). It sends
the resource in its response using the connection and finally closes the connection. The
browser examines the received data. An HTML document can contain links to further re-
sources needed to display it properly, like images or java applets. The browser has to request
them from the HTTP server, too, as they are not included in the HTML document. After re-
ceiving the last resource needed to display the HTML document, the browser successfully
renders the page and waits for further activities of the user.
Figure 2.1 shows a very simplified view of the system: It is easy to see that establishing and
closing TCP/IP connections for each resource is a waste of time and network bandwidth.
HTTP/1.0 offered the “keep alive” header field as an option, in HTTP/1.1 the default is to
keep the connection alive for a certain time of inactivity of the client (usually 15 seconds).
We also simplified the behavior of the browser: Since Netscape’s browser replaced Mosaic,
a browser usually reacts to its user’s activities while loading images.
The dynamic structure diagram in figure 2.2 shows the behavior of the server in more detail:
After initialization, the server enters the request–response loop. It waits for an incoming
request, examines it1 , maps the resource to a file and delivers it to the browser. Finally it
closes the connection to the client.
It is easy to implement an HTTP server displaying the behavior in figure 2.2 in about 100
lines of code (see appendix B.1 for the code). A productive HTTP server is far more complex
for various reasons:
Additional Features
1
only the processing of the GET method is shown in detail.
6 CHAPTER 2. HTTP SERVERS FMC
Initialization
wait for connection
Request port 80 as request (port 80)
server port
establish connection,
read request
HTTP Method
send Error:
response header File not found
send file
send
error message
close connection
• virtual hosts
• proxy functionality
• logging
RFCs (Request For Comments) are a collection of notes about the Internet which started in
1969. These notes describe many standards concerning computer communication, network-
ing protocols, procedures, programs and concepts. All Internet standards are defined in the
RFCs.
To become a standard, an RFC has to traverse three steps called Internet Standards Track:
2. Afterwards, it can become a Draft Standard which is nearly a standard with only minor
changes for special cases to come.
3. Finally, if it is widely used and considered to be useful for the community, it is raised
to the rank of a Standard.
The RFC Editor is responsible for the final document. The RFC Editor function is funded by
the Internet Society http://www.isoc.org.
For further information on RFCs look at http://www.faqs.org/rfcs. This site also
contains links to all involved Internet Related Standard Organizations, like the W3C (http:
//www.w3c.org).
2.3.2 TCP/IP
TCP/IP is the most commonly used network protocol worldwide and all nodes connected
to the Internet use it. TCP/IP consists of the 3 main protocols TCP (Transmission Control
Protocol), UDP (User Data Protocol) and IP (Internet Protocol). UDP is a less important pro-
tocol using the lower–level Protocol IP as well. For more details, have a look at “Computer
Networks” by Andrew Tanenbaum [6].
8 CHAPTER 2. HTTP SERVERS FMC
TCP and UDP are transmission protocols that use IP to transmit their data. While IP is
responsible for transmitting packets to the destination at best effort, TCP and UDP are used
to prepare data for sending by splitting them in packets.
TCP (Transmission Control Protocol) provides a connection for bi–directional communi-
cation between two partners using two data streams. It is therefore called a connection–
oriented protocol. Before sending or receiving any data, TCP has to establish a connection
channel with the target node. To provide the channel for the two data streams it has to split
the data into packets and ensure that packets arrive without error and are unpacked in the
proper order. That way an application using TCP does not have to take precautions for cor-
rupted data transfer. TCP will make sure data transfer is completed successfully or report
an error otherwise.
UDP (User Data Protocol) on the other hand is a much simpler technique for delivering
data packets. It just adds a header to the data and sends them to its destination, regardless
whether that node exists or expects data. UDP does not guarantee that packets arrive, nor
does it ensure they arrive in the order they were sent. If packets are transmitted between two
networks using different paths they can arrive in a wrong order. It’s the application that has
to take care of that. However, for applications needing fast transfer without overhead for
data that is still usable even if single packets are missing or not in order, UDP is the protocol
in choice. Most voice and video streaming applications therefore use UDP.
socket
socket
bind set-up server port request connection
connect
listen
read read
Any communication
write write
A TCP connection can only be established between two nodes: A client node sending a
connection request and a server node waiting for such connection requests. After receiving
a connection request, the server will respond and establish the connection. Then both nodes
can send and receive data through the connection, depending on the application protocol.
When finished, any node (but usually the client) can close the connection. This behavior is
shown in figure 2.3. Here you also see the operating system calls used to control the sockets
— see appendix A.3 for details.
Ports
An address of a TCP or UDP service consists of the IP address of the machine and a port
number. These ports enable hosts using TCP/IP to offer different services at one time and
to enable clients to maintain more than one connection to one single server. On a server,
ports are used to distinguish services. HTTP servers usually use the well–known port 80 to
offer their services. Other standard ports are 53 for DNS and 21 for FTP for example. In any
situation, every connection on a network has different pairs of target and source addresses
(IP address + port number).
IP addresses
IP, the Internet Protocol, is responsible for sending single packets to their destination nodes.
This is accomplished by assigning each computer a different IP address. Each IP address
consists of 32 bits usually represented in 4 dotted decimals each ranging from 0 through
255. An example for a valid IP address is 123.123.123.123. IP addresses can be distinguished
by the networks they belong to. The IP name–space is separated into networks by dividing
the 32 Bits of the address into network and host address bits. This information is used for
routing the packets to its destination.
As covered by the previous chapter, each node on the Internet can be identified by a unique
IP address. Unfortunately IP addresses are numbers which are neither user friendly nor
intuitive.
Name–space
As a solution to that problem, DNS maps user-friendly names to IP addresses. Names used
in the DNS are organized in a hierarchical name-space. The name space is split up into
domains. The topmost domain is the . (Dot) domain. Domains below that, referred to
as first–level domains, split up the name–space by country. The first–level domains com,
net, org and edu are an exception to that rule. Originally they were intended to be used
in the United States only, but now are used all over the world. More first level domains
will be available. Individuals can register second–level domains within almost any of these
domains.
10 CHAPTER 2. HTTP SERVERS FMC
Root Domain
.
Dot ( )
.hpi.uni-potsdam.de. .cs.mit.edu.
DNS Servers are organized hierarchically according to their responsibilities, forming a glob-
ally distributed database. Each DNS Server is an authority for one or multiple zones. Each
zone can contain one branch of the name–space tree. A zone itself can delegate sub–zones
to different name servers. The root name servers are responsible for the ’Dot’ domain and
delegate a zone for each first–level domain to the name servers of the corresponding country
domains. These on the other hand delegate zones for each name registered to name servers
supplied by or for the parties that own the second level domains. These name servers con-
tain entries for sub–zones and/or host–names for that zone. Figure 2.4 shows zones and
their dependencies.
DNS can be queried using either recursive or iterative lookup requests. When using an
iterative request, a DNS server will return either
In a recursive request the server has to look up the IP mapped to the name provided at all
cost. If the server is not responsible for the zone, it has to find out by using iterative requests
with other servers. If it does not have the information in question, it will first query the root
name server for the top level domain. It then will have to query the name servers that he
subsequently is referred to, until one server can successfully answer the request. If no server
can answer the request the last one will report an error, which will then be handed to the
client that the recursive request came from. Most DNS servers responsible of the root or first
level zones will only reply to iterative requests. Figure 2.4 shows the traversal of a recursive
and subsequent iterative requests through the DNS server hierarchy.
FMC
2.3. PROTOCOLS AND STANDARDS 11
2.3.4 HTTP
HTTP is the primary transfer protocol used in the World Wide Web. The first version of
HTTP that was widely used was version 1.0. After the Internet began to expand rapidly,
deficiencies of the first version became apparent. HTTP 1.1, the version used today, ad-
dressed these issues and extended the first version. Although HTTP doesn’t set up a session
(stateless protocol) and forms a simple request–response message protocol, it uses connec-
tions provided by TCP/IP as transport protocol for its reliability. HTTP is designed for and
typically used in a client–server environment.
With HTTP, each information item available is addressed by a URI (Uniform Resource
Identifier), which is an address used to retrieve the information. Even though URIs and
URLs historically were different concepts, they are now synonymously used to identify
information resources. URL is the more widely used term. An example for a URL is:
http://www.fmc-modeling.org/index.php. It would result in the following request
In this example the Request URI as seen by the web server is /index.php.
HTTP data transfer is based on messages. A request from a client as well as a response from
a server is encoded in a message. Each HTTP message consists of a message header and can
contain a message body.
An HTTP header can be split up into 4 parts.
Header and body of an HTTP message are always separated by a blank line. Most header
fields are not mandatory. The simplest request will only require the request line and, since
HTTP 1.1, the general header field "HOST" (see section 2.3.4). The simplest response only
contains the status line.
An example request/response message pair is shown in figure 2.5. The E/R diagrams in
figures 2.6 and 2.7 show more details of the structure of the HTTP messages..
The next sections cover aspects of HTTP including their header fields. Important header
fields not covered later are:
• "Content-length" / "Content-type" are fields to specify the length and the MIME
type of the information enclosed in the body. Any request or response including a
message body uses these header fields.
12 CHAPTER 2. HTTP SERVERS FMC
Request Message
POST /cgi-bin/form.cgi HTTP/1.1↵ Header: Request line
Host: www.myserver.com↵
Header: General
Accept: */*↵
User-Agent: Mozilla/4.0↵ Header: Request
Content-type: application/x-www-form-urlencoded↵
Header: Entity
Content-length: 25↵
↵ Blank line
NAME=Smith&ADDRESS=Berlin↵ Body (Entity)
Response Message
HTTP/1.1 200 OK↵ Header: Status line
Date: Mon, 19 May 2002 12:22:41 GMT↵ Header: General
Content-type: text/html↵
Header: Entity
Content-length: 2035↵
↵ Blank line
<html>
<head>..</head>
Body (Entity)
<body>..</body>
</html>
!
"# ≠ "# !
!
"# ≠ "# !
'
% &
&
&
& '''
HTTP Methods
HTTP methods are similar to commands given to an application. Depending on the method
used in the request, the server’s response will vary. Successful responses to some request
methods do not even contain body data.
The HTTP/1.1 standard defines the methods GET, POST, OPTIONS, HEAD, TRACE, PUT,
DELETE, CONNECT. The most often used methods are GET and POST.
GET is used to retrieve an entity of information without the need to submit additional
data in the message body. Before HTTP 1.0, GET was the only method to request
information.
POST is similar to a GET request, but POST always includes a message body in the re-
quest message to send information of any type to the server. Usually information
submitted via POST is used to generate dynamic content, for further processing,
or the information is simply stored to be used by other applications. POST is a
method that was introduced with HTTP version 1.0.
To send information to the server with GET, the client has to append it to the request URI.
That causes several difficulties however:
14 CHAPTER 2. HTTP SERVERS FMC
• The length of the request URI can cause problems at the server,
• most clients display the additional Request URI information to the user
Even though POST is the better way to transmit additional information to the server, some
applications use GET for that purpose, especially for small amounts of data or to allow book-
marking of the URL.
All other methods are rarely used and will only be covered briefly:
HEAD This method asks for the header of a reply only. It can be used when checking
for the existence of a certain document on a web server. The response will look
exactly as if requested via GET but will not include the message body
OPTIONS Using this method a client can query a server for the available methods and
options concerning a resource.
TRACE The TRACE method is similar to ping in TCP/IP. The request message contains
a mandatory header field called Max-Forwards. Each time the message passes
a proxy server, the value of that field is decremented by one. The server that
gets the message with a value of zero will send a reply. If the server to whom
the message is addressed gets the message, it will send a reply regardless of the
value in the max-forwards header field. Using a sequence of these requests, a
client can identify all proxy servers involved in forwarding a certain request.
PUT used to transmit files to a server. This method is similar to the PUT command
used in FTP. This imposes a security threat and is therefore almost never used.
DELETE This method asks the server to delete the file addressed in the URI. Since this
method imposes a security risk no known productive HTTP servers support that
method. The DELETE method is very similar to the DELETE command in FTP.
CONNECT is a command used to create a secure socket layer (SSL) tunnel through a HTTP
proxy server. A proxy server addressed with that method would open a connec-
tion to the target server and forward all data regardless of its content. That way
a secure connection can be established from client to the server even though a
proxy server is in use.
Server responses
As stated above, each server reply always contains a status code. Generally server replies
are structured in 5 different categories. Status Codes are three digit numbers. Each category
can be identified by the first digit. These Categories split up the total set of status codes by
their meaning:
4xx Client Error — For example 404 Not found or 403 Forbidden
Virtual Hosts
Virtual Hosts is a concept which allows multiple logical web servers to reside on one physi-
cal server, preferably with one IP Address. The different concepts are:
• A server is assigned multiple IP addresses, and each IP address is used by one single
logical web server.
• A server is assigned one IP address and the different logical web servers listen to dif-
ferent ports. This results in URLs looking like http://www.xyz.com:81/
• A server is assigned one IP address. Multiple Domain Names are mapped to that IP
address. All logical web servers listen to one single port. The server distinguishes
requests using the Host field, which is mandatory for HTTP requests since HTTP/1.1.
HTTP/1.0 did not explicitly support virtual hosts. A web server managing multiple do-
mains had to distinguish requests by the destination IP address or the port. As different
ports were rarely used for virtual hosts, the server needed one IP address for each domain
hosted. When the Internet began to grow rapidly, the amount of IP addresses available soon
was too limited. A solution based on different ports was inconvenient and could cause con-
fusion when a user forgot to supply the port number and received no or a wrong document.
HTTP/1.1 introduced the Host header field, which is mandatory in any HTTP/1.1 request.
Therefore a server can now host multiple domains on the same IP address and port, by
distinguishing the target of the request using the information supplied in the HOST header
field.
Content Negotiation
Usually the body of an HTTP response includes data for user interpretation. Different users
might be better served with different versions of the same document. Apache can keep
multiple versions of the same document, in either a different language or a different for-
mat. The included standard page displayed right after Apache is installed is an example as
there are multiple versions each in a different language. Two ways can be distinguished for
determining the best version for a user: server driven and client driven content negotiation.
Server Driven Content Negotiation With server driven content negotiation, the server
decides which version of the requested content is sent to the client. Using the Accept header
field, the client can supply a list of formats that would be acceptable to the user, regarding
format as well as language. The server will then try to select the best suitable content.
16 CHAPTER 2. HTTP SERVERS FMC
Client Driven Content Negotiation Using server driven content negotiation, the client has
no influence on the choice made by the server if none of the accepted formats of the source
are available. Since it is not practicable to list all possible formats in the desired order, the
client can use the Accept Header with the value Negotiate. The server will then reply with
a list of available formats instead of the document. In a subsequent request the client can
then directly request the chosen version.
Persistent Connections
HTTP/1.0 limited one TCP connection to last for one single request. When HTTP was de-
veloped, HTML documents usually consisted of the HTML file only, so the protocol was
appropriate. As web pages grew to multimedia sites, one single page consisted of more than
one document due to images, sounds, animations and so forth. A popular news web–site’s
index page today needs 150 different file requests to be displayed completely. Opening and
closing a TCP connection for every file imposed a delay for users and a performance over-
head for servers. Client and server architects soon added the header field "Connection:
keep-alive" to reuse TCP connections, despite the fact that it was not part of the HTTP
standard.
HTTP/1.1 therefore officially introduced persistent connections and the Connection header
field. By default a connection is now persistent unless specified otherwise. Once either part-
ner does not wish to use the connection any longer, it will set the header field "Connection:
close" to indicate the connection will be closed once the current request has been finished.
Apache offers configuration directives to limit the amount of requests for one connection
and a time–out value, after which any connection has to be closed when no further request
is received.
Statistically, it is a well–known fact that a very high percentage of the HTTP traffic is accu-
mulated by a very low percentage of the available documents on the Internet. Also a lot of
these documents do not change over a period of time. Caching is technique used to tem-
porarily save copies of the requested documents either by the client applications and/or by
proxy servers in between the client application and the web server.
Proxy Servers A proxy server is a host acting as a relay agent for an HTTP request. A
client configured to use a proxy server will never request documents from a web server
directly. Upon each request, it will open a connection to the configured proxy server and
ask the proxy server to retrieve the document on its behalf and to forward it afterwards.
Proxy Servers are not limited to one instance per request. Therefore a proxy server can be
configured to use another proxy server. The technique of using multiple proxy servers in
combination is called cascading. Proxy Servers are used for two reasons:
1. Clients may not be able to connect to the web server directly. Often proxy servers act
as intermediate nodes between private networks and public networks for security rea-
sons. A client on the private network unable to reach the public network can then ask
the proxy server to relay requests to the public network on its behalf. HTTP connec-
tivity is then ensured.
FMC
2.4. ACCESS CONTROL AND SECURITY 17
2. Caching proxy servers are often used for performance and bandwidth saving reasons.
A document often requested by multiple nodes only needs to be requested once from
the origin server. The proxy server which was involved in transmitting the document
can keep a local copy for a certain time to answer subsequent requests for the same
document without the need to contact the source’s web server. Bandwidth is saved,
and performance improves as well if a higher quality connection is used between the
proxy server and the clients.
Cache Control Even though caching is a favorable technique, it has its problems. When
caching a document, a cache needs to determine how long that document will be valid
for subsequent requests. Some information accessible to a cache is also of private or high
security nature and should in no case be cached at all. Therefore cache control is a complex
function that is supported by HTTP with a variety of header fields. The most important are:
Expires Using this header field, a server can equip a transmitted document with
something similar to a time–to–live. A client or proxy capable of caching
and evaluating that header field will only need to re–request the docu-
ment if the point in time appended to that header field has elapsed.
Last-Modified If the server cannot supply a certain expiration time, clients or proxies
can implement algorithms based on the Last-Modified date sent with a
document. HTTP/1.1 does not cover specific instructions on how to use
that feature. Clients and proxies can be configured to use that header field
as found appropriate.
Additionally, HTTP/1.1 includes several header fields for advanced cache control. Servers
as well as clients are able to ask for a document not to be cached. Servers can also explicitly
allow caching, and clients are even able to request a document from a caching proxy server
only if it has a cached copy of the document. Clients can also inform caching proxy servers
that they are willing to accept expired copies of the requested document.
2.4.1 Authorization
applies to persons, it requests authentication data from the client to check the user’s identity.
After that it checks if the authorization rules grant access to the resource for this user.
Access to a resource can be restricted to the domain or network address of the browser
machine or to a particular person or group of persons. Apache determines this by pars-
ing the global and local configuration files. If permitted by the administrator, web authors
can restrict access to their documents via local configuration files which are usually named
.htaccess (hypertext access).
Authorization rules can combine machine and person access. They apply to resources and
to HTTP methods (e.g. disallow POST for unknown users). For further information on
administration, take a look at the configuration section 3.2.1.
• User ID & Password: The user has to tell the server his user ID and a secret password.
The server checks the combination using its user database. If it matches, the user is
authenticated.
The user database can have various forms:
– a simple file
– a database (local or on a different server)
– the authentication mechanism of the operating system, a database or any appli-
cation managing user data
– a user management service like LDAP2 , NIS3 , NTLM4
• Digital signature: The user provides a certificate pointing out his or her identity. There
must be mechanisms to make sure that only the user and no one else can provide the
certificate (public/private keys, key cards, etc.)
How does the server get the authentication data? There are two ways:
• HTTP authentication
• HTML Forms, Java applets and server–side scripting
When a request applies to a protected resource, Apache sends a 401 unauthorized re-
sponse to the client. This is done to inform the client that he has to supply authentication
2
Lightweight Directory Access Protocol: A protocol, introduced by Netscape, to access directory servers
providing user information.
3
Network Information Service (formerly YP – Yellow Pages): A network service providing user information
and workstation configuration data for unix systems.
4
NT LanManager: A protocol for user authentication in SMB networks used by Microsoft Windows.
FMC
2.4. ACCESS CONTROL AND SECURITY 19
NONE
send GET resource
send GET resource
with authentication
Authorization required?
Yes No
check authentication
data (if any)
access
no authentication data granted
or no access
send "401
unauthorized"
prompt for
authentication data
store authentication
information in buffer
send resource
process resource
information within his requests. Along with the 401 response the server sends the realm, the
corresponding protected area of the website.
A browser then prompts the user for authentication data (username and password). Af-
terwards, the client repeats the request including the new authentication data. This data is
transmitted in HTTP header fields.
Since HTTP is a stateless protocol, authentication data must be retransmitted in every re-
quest. The browser normally keeps the authentication information for the duration of the
browser session. For every request concerning the same realm, the browser sends the stored
authentication information (see Figure 2.8).
Generally, HTTPS should always be used for transmitting authentication data.
Get Authentication data with HTML Forms, Java applets or Java Script
Browsers usually support basic authentication. Other methods require additional software
at the server side. There are several other possibilities to handle authentication data:
There is, for example, the possibility to enter authentication information into HTML forms.
This information is afterwards passed to the server with HTTP POST, in the body of the
request. This is different to basic authentication where the authentication data is sent in the
header of the request. A CGI program can then perform the authentication check.
Another possibility is the usage of Java applets or JavaScript code. The server sends a Java
applet or JavaScript code to the client. The browser executes the code which asks for au-
thentication data and sends it to the server in an arbitrary way.
Both methods need additional software at the HTTP server to do authentication. This could
for example be a CGI script that handles the content of a POST request. Additionally, as
before, HTTPS is needed for a secure transmission of the data.
FMC
2.4. ACCESS CONTROL AND SECURITY 21
Securing connections
For securing connections either symmetric or public/private key algorithms can be used.
With symmetric key algorithms both communication partners need the same key to encrypt
and decrypt the data. That imposes a problem, as the key has to be transmitted from one
partner to the other before secure communication can begin. Secure transfer of the symmet-
ric key has to be accomplished using other security mechanisms.
Public/private key mechanisms are based on two different keys. One communication part-
ner publishes the public key to anyone wishing to communicate. The public key is used
to encrypt messages that only the owner of the private key can decrypt. Employing that
technique on both sides can secure a two–way connection.
Symmetric key mechanisms usually require a smaller processing overhead than public key
mechanisms. Therefore public key securing mechanisms are often used to securely trans-
mit a symmetric key with a short validity. That secures the whole data transmission and
minimizes processing overhead.
Authentication by certificates
private key and therefore must be the entity he claims to be. Also when authenticating web
servers any certificate includes a domain name. A server can only be authenticated if it can
be reached via the domain name specified in the certificate.
SSL is a protocol which can be used together with any reliable data transfer protocol like
TCP. It employs mechanisms that provide Security and Authentication. SSL employs public
key mechanisms to exchange symmetric session keys which are then used to encrypt any
transmitted data. Certificates are used to ensure authentication. The combination of HTTP
and SSL is generally referred to as secured HTTP or HTTPS. By default it uses port 443
instead of the standard HTTP port 80. Browsers will indicate to the user that a secure con-
nection is in use and will also notify the user in case a certificate has expired or any other
situation occurred that made the establishment of a secure connection impossible.
HTTPS uses certificates to publish and verify the public key that is used to exchange the
symmetric key. Therefore in an HTTPS handshake, first the client requests the server’s cer-
tificate, checks that against the Certificate Authority’s certificate and uses the contained pub-
lic key to verify the server’s identity. The key is also used to exchange a “pre master secret”
which is used to create the symmetric key, also referred to as the “master secret”. At the end
of the handshake each communication partner informs the other that future transfer will be
encrypted and starts the HTTP session.
SSL–secured HTTP connections can be reused within a certain period of time to reduce the
overhead that the identification check and the tunnel creation imposes.
Today SSL is used in almost any Web Application that involves payment or private infor-
mation. Almost any eCommerce website and any online banking application that is based
on HTTP uses SSL to secure the connection.
SSL can be used to secure any protocol using TCP. The SSL handshake requires additional
actions from client and server, so establishing an SSL connection can not be transparent to
the application. A solution is to replicate server ports by tunneling the TCP connection
through a SSL connection.
Even though SSL also supports client authentication which allows the web server to clearly
identify the client as a person, it is rarely used, as a client certificate is required. Client certifi-
cates require the person wanting to be clearly identified to buy a certificate at a certification
authority. As hardly any company wants to force their customers to spend money just to
be able to buy their products, usually username and password based methods are used to
identify the client. However, the transmission of the user name and password is usually
secured using SSL.
FMC
2.5. SESSION MANAGEMENT 23
HTTP is a stateless protocol. That means each HTTP request can be treated as a "new" re-
quest with no relations to previous requests. The servers doesn’t have to keep any informa-
tion about previous requests from the client or any session information at all. The advantage
of this design is that no session information has to be kept at the server. This allows simple
and fast servers.
To implement client–server applications that are more complex and therefore need session
information, the client has to send at least a session identification to the server with every
request.
If a server application, for example a CGI program, needs to keep and manage state infor-
mation, for example a session, it has to get the complete state information with every request
or map an incoming request to a session. So the client has to transmit all state data or a state
ID with every request.
There are several possibilities to achieve that :
HTML Forms
A browser sends all data of a HTML form with a POST or GET request. The server sends a
form including state data stored in hidden form fields. Hidden form fields are not visible in
the browser, but can be seen in the source HTML code. Normally this information will not
be altered and just sent back to the server with every POST or GET of the form. A problem
lies within the fact that an experienced user can alter the information.
HTML links
It is also possible to generate links in HTML documents containing a state ID. This could
simply be attached to the request URL on every GET request. In contrast to the HTML
forms, this works with every kind of link in an HTML page.
Cookies
to decide which cookies to send with which request. If the client gets a newer cookie with
the same name from the same server, the old one is replaced.
Cookies can be used to collect user data and to create detailed profiles of the user. For
example using the information gathered using cookies one can keep track on where a user
surfs in the Internet and in what he is interested in. That is the reason why cookies are often
criticized.
"The state object is called a cookie, for no compelling reason." (Netscape Cookie Specification)
The server can send a Java applet or a JavaScript program to the browser which executes
this program. Such a program can then store the state information and communicate via
POST or GET or even with an arbitrary protocol with a program at the server machine.
However, the client browser must support execution of java or script code. Many users don’t
like to execute java applets or script code on their machine, since they could be dangerous if
the source cannot be trusted.
An easy way to enable a server to deliver dynamic content is to employ server–side script-
ing. As mentioned above one of the first technologies to enable the server to provide dy-
namic content was CGI. To enable scripting using CGI, the web server executes an external
program, which is able to interpret scripts that return HTML code to the server which then
forwards the output to the client.
Optionally the server is enhanced using modules to support script languages that are in-
terpreted within the server’s context when a request is handled. The module supplies an
execution environment for the script and offers API functions to enable the script to access
external data sources and data received from the client.
In Apache, each script–language–enabling module usually registers a separate MIME–type
and defines a file extension for files supposed to be executed by the module’s content han-
dler.
Generally Server–side Scripting can be subdivided in scripts embedded in HTML files and
scripts that output complete HTML documents.
In this case the script is embedded in the HTML document file on the web server. Basically
the document includes HTML code like a static page. Within certain tags that are recognized
by the script–executing module, the developer can insert scripting instructions that will out-
put HTML code replacing the scripting instructions. The output these instructions generate
can be based on external data sources such as a database or on input received by the client
with the request.
Server–Side Includes (SSI) One example for scripting embedded in HTML code is "Server
Side Includes" also referred to as SSI. SSI enables basic commands like assigning values to
variables, accessing own and system’s variables, doing basic computation and even execute
system commands like on a command line and printing their output to the web page.
One common use for SSI is to include another file into a document. Therefore SSI can be
used to decorate CGI pages with a header and footer and therefore save some work when
programming CGI pages by excluding static HTML output from the CGI script. (See below
for information on CGI).
SSI commands can be included in HTML pages using special commands within comment
tags, which a browser will ignore in case the server is accidentally unable to interpret the
script:
<!--#your_command_here -->
Other examples for HTML enhancing script languages are JSP, PHP or ASP.
26 CHAPTER 2. HTTP SERVERS FMC
For complex web applications HTML enriching script languages tend to be not performant
enough. Therefore compiled programs or scripts are used that output complete HTML doc-
uments. Files containing scripting commands may not include static HTML, which relieves
the server from having to parse the whole document for scripting instructions. Another rea-
son for these script languages to be a lot faster than languages allowing simple HTML code
in their files is that some of them can be compiled and therefore run a lot faster than scripts
that have to be interpreted.
Examples for that category are CGI programs and Java Servlets. Certain flavours of CGI and
Java Servlets have to be compiled and therefore gain performance.
Usually such technologies are a lot more powerful and therefore allow for more flexibility.
Communication with external programs is easier and some even start processes that run in
the background and are responsible for keeping state information. CGI is used to start exter-
nal programs (like directory info, e.g. ls -l or dir) and send their output to the client. The
drawback of CGI’s approach is the fact that the program is started in a separate context (e.g.
a process) when a request is received which can drastically affect performance. Additionally
communication is restricted to the use of environment variables and the use of STDIN and
STDOUT.
Scripting technologies employing modules to interpret scripts overcome these limitations
as scripts are interpreted in the server context. Therefore no context switch is needed and
communication between server and script is only limited by the server’s and the module’s
API. With that technology, a request arriving at the web server will trigger the execution or
interpretation of the script and forward all output to the client. Therefore the script outputs
HTML code, which is sent to the client and interpreted by the client’s browser as if it was a
static HTML page.
The complexity of a script application is usually not limited by the script language capa-
bility but by the servers performance. For very complex applications it might be worth
implementing an add–on module compiled into the web server and therefore usually gains
performance as no interpretation nor a context switch is needed Additionally the possibility
to use the complete server API without the restrictions a interpreting module may imply
allows for more powerful applications.
Chapter 3
3.1 Overview
3.1.1 History
The Beginning
Apache is an offspring of the NCSA httpd web server designed and implemented by Rob
McCool. He made the NCSA server as the web server market was dominated by com-
plex and heavyweight server systems. Many people wanted a simple and small server that
would be adequate for the needs of a simple web site. However Rob McCool was not able
to continue his work on the NCSA server due to work commitments. He left the project and
abandoned the server. At that stage a lot of people were already using the NCSA server.
As with any software system, a lot of people patched the server to suit their own demands
and to get rid of bugs. In 1995 Brian Behlendorf started to collect those patches and exten-
sions and founded a mailing list that was solely used for that purpose of exchanging. A
group of 8 people that formed that mailing list community released the very first version of
Apache. Due to its nature of consisting of patches and extensions to the NCSA server, the
name Apache is actually a derivate of "a patchy server".
Evolution
The first version that was released by the mailing list was version 0.6.2. One member of
the growing group of developers, Robert Thau, designed a new server architecture that was
introduced with version 0.8.8. On December 1st in 1995, Apache version 1.0 was released
and only took a year to become more popular and more widely used than the older and
formerly popular NCSA server.
During the next years the group began to grow and Apache received many new features
and was ported to different operating systems.
In 1999 the group founded the Apache Software Foundation to form a non–profit company.
In March 2000 the ApacheCon, a conference for Apache developers, was held for the first
time.
27
28 CHAPTER 3. THE APACHE HTTP SERVER FMC
Apache 2
On the ApacheCon conference in March 2000, Apache version 2.0 Alpha 1 was introduced
to the public. The version 2 of Apache again introduced a complete redesign of the server
architecture. Apache 2.0 is easier to port to different operating systems and is considered so
modular that it does not even have to be used as a web server. By designing appropriate
modules the Apache 2.0 core could be used to implement any imaginable network server.
Today both versions of Apache, version 1.3 and version 2.0 exist. Even though people are
encouraged to use the newer version many still use version 1.3 which is still being further
developed and supported.
For more information on Apache history see the Apache History Project’s website at
http://www.apache.org/history/.
3.1.2 Features
Both versions of Apache used today form the biggest share of the web server market. Even
though the fact that Apache is free is an influencing fact, the main reason for Apache’s
success is its broad range of functionality.
R R R
HTTP HTTP HTTP
R R
any Server
(e.g. database)
Server machine
Apache is a server that supports concurrency and can therefore serve a big number of clients.
The number of clients that can be served concurrently is limited only by the underlying
FMC
3.2. USING APACHE 29
hardware and operating system. The server can be easily configured by editing text files
or using one of the many GUIs that are available to manage these. The server can be re-
configured without having to stop the server. Due to its modularity, many features that are
necessary within special application domains can be implemented as add–on modules and
plugged into the server. To support that, a well documented API is available for module
developers. Its modularity and the existence of many free add-on modules makes it easy to
build a powerful web server without having to extend the server code. Using many of the
available server based scripting languages, web based applications can be developed eas-
ily. When using scripting languages or add–on modules Apache can even work with other
server applications like databases or application servers. Therefore Apache can be used in
common multi–tier scenarios. Additionally Apache is completely HTTP 1.1 compliant in
both of the current versions. The installation is easy and ports for many popular platforms
are available.
The block diagram in figure 3.1 shows an Apache HTTP Server in its environment. In com-
parison to the simple HTTP Server system shown in figure 2.1, we see a administrator, con-
figuration by files and server extensions using CGI or or the Server API. These extensions
(Information processor and provider) can access any resources at the server machine or via
network at any remote machine.
3.2.1 Configuration
As the usage and administration of Apache is covered by many other publications, this
document will only give an overview, as it is necessary to understand how Apache works
as a basis for the following parts of this document.
There are basically four ways to configure Apache:
Web Web
Browser Browser
HTTP R HTTP R
Documents
Admin Author
global Scripts
Apache HTTP Server Author
configuration
Files
local
configuration
(.htaccess)
Files Author
Figure 3.2: Configuring the Apache HTTP Server via configuration files
The last two possibilities describe the configuration of Apache via text files as it can be seen
in figure 3.2. The following parts will focus on explaining the structure of these configura-
tion files and give examples on how to use them.
Global configuration The configuration directives in the main Apache server configura-
tion file httpd.conf are grouped into three basic sections:
1. Directives that control the operation of the Apache server process as a whole (the
’global environment’).
2. Directives that define the parameters of the ’main’ or ’default’ server, which responds
to requests that aren’t handled by a virtual host. These directives also provide default
values for the settings of all virtual hosts.
3. Settings for virtual hosts, which allow HTTP requests to be sent to different IP ad-
dresses and/or hostnames and have them handled by the same Apache server in-
stance.
Directives placed in the main configuration files apply to the entire server. If you wish
to change the configuration for only a part of the server, you can scope your direc-
tives by placing them in <Directory>, <DirectoryMatch>, <Files>, <FilesMatch>,
<Location>, and <LocationMatch> sections. These sections limit the application of the
directives which they enclose to particular file system locations or URLs.
FMC
3.2. USING APACHE 31
The <Directory> sections apply to ’real’ directories at any position in the file system,
whereas <Location> sections apply to the Request URIs.
Apache has the capability to serve many different websites with different host names simul-
taneously. This is called Virtual Hosting. Therefore directives can also be scoped by placing
them inside <VirtualHost> sections, so that they will only apply to requests for a particular
website.
In the global server configuration file the webmaster can configure the server with the pro-
vided directives and limit the options the users have in the per–directory configuration files
(.htaccess files).
Changes to the main configuration files are only recognized by Apache when it is started or
restarted.
Syntax
Apache gets its instructions through configuration directives used in the configuration files.
There are two types of directives, simple ones and sectioning directives, which again can
contain one or more directives.
Apache processes the files on a line by line reading any line that is neither empty nor a
comment line beginning with the character ’#’. The first word in such a line is the name of
the directive whereas the remaining ones are treated as the parameters of the directive. To
use more than one line for the parameters of a directive, the backslash ’\’ may be used as
the last character on a line to indicate that the parameters continue on the next line.
Apache distinguishes between several contexts in which a directive can be used. Each di-
rective is only allowed within a fixed set of contexts.
Global Configuration The ’Per–Server Context’ applies to the global httpd.conf file (the
file name can be overridden by the -f option on the httpd command line) and is divided
into five sub–contexts:
32 CHAPTER 3. THE APACHE HTTP SERVER FMC
1. The global context which contains directives that are applied to the default or main
server.
2. (<VirtualHost>) The virtual host sections contain directives that are applied to a par-
ticular virtual server.
4. (<Files>, <FilesMatch>) The file sections contain directives that are applied to par-
ticular files.
5. (<Location>, <LocationMatch>) The URL sections contain directives that are ap-
plied to a particular URL and its sub–areas.
Directives placed in the main configuration file apply to the entire server. To change the con-
figuration for only a part of the server, place your directives in the appropriate context. Some
section types can also be nested, allowing for very fine grained configuration. Generally, all
directives can appear in the global configuration file.
Local Configuration The ’Per–Directory Context’ applies to the local .htaccess files,
which only allow configuration changes using directives of the following five sub-contexts:
2. (Limits) The Limit context contains directives that control access restrictions.
3. (Options) The Option context contains directives that control specific directory fea-
tures.
4. (FileInfo) The File information context contains directives that control document at-
tributes.
5. (Indexes) The Index context contains directives that control directory indexing.
Directives placed in .htaccess files apply to the directory where you place the file, and all
sub–directories. The .htaccess files follow the same syntax as the main configuration files.
The server administrator further controls what directives may be placed in .htaccess files
by configuring the ’AllowOverride’ directive in the main configuration files.
Further information and a list of directives with allowed contexts can be found in [1] and at
http://httpd.apache.org/docs/.
2. Location walk
The configuration for the URI has to be retrieved before the URI is translated.
4. Directory walk beginning from root (/) directory, applying .htaccess files
Apache reads the configuration of every section of the path and merges them.
5. File walk
Apache gets the configuration for the files.
When Apache determines that a requested resource actually represents a file on the disk, it
starts a process called ’directory walk’. Therefore Apache has to check through its internal
list of <Directory> containers, built from the global configuration files, to find those that
apply. According to the settings in the global configuration files Apache possibly searches
the directories on the file system for .htaccess files.
Whenever the directory walk finds a new set of directives that apply to the request, they are
merged with the settings already accumulated. The resulting collection of settings applies
to the final document, assembled from all of its ancestor directories and the server’s config-
uration files.
When searching for .htaccess files, Apache starts at the top of the file system. It then
walks down the directories to the one containing the document. It processes and merges
any .htaccess files it finds that the global configuration files say should be processed.
The sections are merged in the following order:
Each group is processed in the order that they appear in the configuration files. Only
<Directory> is processed in the order “shortest directory component to longest”. If mul-
tiple <Directory> sections apply to the same directory they are processed in the configu-
ration file order. The configuration files are read in the order httpd.conf, srm.conf and
access.conf (srm.conf and access.conf are deprecated and are kept only for backward–
compatibility).
Sections inside <VirtualHost> sections are applied after the corresponding sections outside
the virtual host definition. This way, virtual hosts can override the main server configura-
tion.Finally, later sections override earlier ones.
For details see sections 4.4.4 and 4.5.
34 CHAPTER 3. THE APACHE HTTP SERVER FMC
Example Configuration
httpd.conf:
#########################################
# Section 1: Global Environment
# Many of the values are default values, so the directives could be omitted.
ServerType standalone
ServerRoot "/etc/httpd"
Listen 80
Listen 8080
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MinSpareServers 5
MaxSpareServers 10
StartServers 5
MaxClients 150
MaxRequestsPerChild 0
#########################################
# Section 2: "Main" server configuration
ServerAdmin [email protected]
ServerName www.foo.org
DocumentRoot "/var/www/html"
<Directory "/var/www/html">
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
#########################################
# Section 3: virtual hosts
<VirtualHost www.foo.dom:80>
<Location /server-status>
SetHandler server-status
Order Deny,Allow
Deny from all
Allow from .foo.com
</Location>
</IfModule>
</VirtualHost>
3.2.2 Performance
Means to slow down a server
(to be completed)
(too few processes, insufficient memory, too many checks (for example: check for symbolic
links), complex configuration (for example: too many .htaccess files, too many modules)
Configuration
(to be completed)
(choose proper configuration for intended usage)
3.3.1 Introduction
Modules are pieces of code which can be used to provide or extend functionality of the
Apache HTTP Server. Modules can either be statically or dynamically included with the
core. For static inclusion, the module’s source code has to be added to the server’s source
distribution and to compile the whole server. Dynamically included modules add func-
tionality to the server by being loading as shared libraries during start–up or restart of the
server. In this case the module mod_so provides the functionality to add modules dynam-
ically. In a current distribution of either Apache 2.0 or Apache 1.3, all but very basic server
functionality has been moved to modules.
Modules interact with the Apache server via a common interface. They register handlers for
hooks in the Apache core or other modules. The Apache core calls all registered hooks when
applicable, that means when triggering a hook. Modules on the other hand can interact with
the server core via the Apache API. Using that API each module can access the server’s data
structures, for example for sending data or allocating memory.
Each module contains a module–info, which contains information about the handlers pro-
vided by the module and which configuration directives the module can process. The mod-
ule info is essential for module registration by the core.
All Apache server tasks, be it master server or child server, contain the same executable
code. As the executable code of an Apache task consists of the core, the static modules and
the dynamically loaded ones, all tasks contain all modules.
36 CHAPTER 3. THE APACHE HTTP SERVER FMC
call Module a
handler
R
module
registry
Module b
hook
registry R
filter
registry
Module d
optional
function
registry
As you can see in figure 3.4, Modules and the Core can interact in two different ways. The
server core calls module handlers registered in its registry. The modules on the other hand
can use the Apache API for various purposes and can read and modify important data
structures like the request/response record request_rec and allocate memory in the corre-
sponding pools.
• Filters
• Optional functions
A hook is a transition in the execution sequence where all registered handlers will be called.
It’s like triggering an event which results in the execution of the event handlers. The imple-
mentation of a hook is a hook function named ap_run_HOOKNAME which has to be called to
trigger the hook.
Two types of calling handlers for a hook can be distinguished:
RUN_ALL/VOID: The core calls all registered handlers in their order regardless whether
they can complete the task or refuse to complete it, unless an error occurs.
RUN_FIRST: The core calls the registered handlers in their order until one module can com-
plete the task or an error occurs.
Each module has to register its handlers for the hooks with the server core first before the
server core can call them. Handler registration is different in Apache 1.3 and 2.0. Apache
1.3 provided 13 predefined hooks. Registration of the module’s handlers was done auto-
matically by the core by reading the module info while loading modules. In Apache 2.0, the
module info only contains references to four handlers for predefined hooks used for configu-
ration purposes only. All other hook handlers are registered by calling the register_hooks
function each module has to provide.This makes it easier to provide new hooks without
having to alter the Apache module interface. A module can provide new hooks for which
other new modules can register hooks as well.
Figure 3.5 shows how hooks and handlers interact in Apache: A hook ABC has to be defined
by some C macros (AP_DECLARE_HOOK, etc, see bottom line}. This results in the creation of
a registration procedure ap_hook_ABC, a hook caller procedure ap_run_ABC and an entry
in the hook handler registry which keeps information about all registered handlers for the
hook with their modules and their order. The module (meta) info at the top points to the
hook handler registration procedure (register_hooks) which registers the handlers for the
38 CHAPTER 3. THE APACHE HTTP SERVER FMC
Module XY
Configuration processor Module meta information
Configuration
Configuration Configuration
Command
hook handler
management hook handler
(merge-dir-config, ...)
handler
Registration
for hook abc
info
registration
(register_hooks)
Hook handler registry
R
Hook caller abc Hook handler Filter
(ap_run_...abc) (abc-procedure)
R
Hook caller xyz Hook handler Optional function
(ap_run_...xyz) (xyz-procedure)
R R
Request
processing controler
(procedures below child_main;
ap_process_http_connection
implements already a hook)
hooks calling the ap_hook_xxx procedures. At the bottom, an agent called “request process-
ing controler” is a representative of all agents triggering hooks by calling the ap_run_xxx
procedures which read the hook handler registry and call all or one registered handler.
The order of calling handlers for a hook can be important. In Apache 1.3, the order of the
module registration determined the order in which their handlers would be called. The
order could be altered in the configuration file but was the same for all 13 hooks. In Apache
2, this has changed. The hook registry can store an individual order of handlers for each
hook. By registering a handler for a hook using the ap_hook_xxx procedure, a module can
supply demands for its position in the calling sequence. It can name modules that’s handlers
have to be called first or afterwards, or it can try to get the first or the last position.
A module can provide an own set of directives which can be used in the configuration files.
The configuration processor of the core therefore delegates the interpretation of a directive
to the corresponding command handler which has been registered for the directive. In figure
3.5 the module (meta) info at the top points to the configuration management handlers of the
module (create-dir-config, merge-dir-config, etc.) and to the command table which contains
configuration directives and the pointers to the corresponding command handlers.
The configuration management handlers have the purpose to allocate memory for configu-
ration data read by the command handlers and to decide what to do if configuration param-
eters differ when hierarchically merging configuration data during request processing.
Optional functions
An Apache 2.0 module can also register optional functions and filters. Optional Functions
are similar to hooks. The difference is that the core ignores any return value from an optional
function. It calls all optional functions regardless of errors. So optional functions should be
used for tasks that are not crucial to the request–response process at all.
The most important step in the request–response loop is calling the content handler which
is responsible for sending data to the client.
In Apache 1.3, the content handler is a handler very much like any other. To determine
which handler to call Apache 1.3 uses the type_checker handler which maps the requested
resource to a mime–type or a handler. Depending on the result, the Apache Core calls
the corresponding content handler which is responsible for successfully completing the re-
sponse. It can write directly to the network interface and send data to the client. That
makes request handling a non-complex task but has the disadvantage that usually only one
module can take part in handling the request. If more than one content handler have been
determined for the resource, the handler that was registered first is called. It is not possible
that one handler can modify the output of another without additional changes in the source
code.
40 CHAPTER 3. THE APACHE HTTP SERVER FMC
Apache 2.0 extends the content handler mechanism by output filters . Altough still only one
content handler can be called to send the requested resource, filters can be used to manip-
ulate data sent by the content handler. Therefore multiple modules can work cooperatively
to handle one request. During the mime–type definition phase in Apache 2.0 multiple filters
can be registered for one mime–type together with an order in which they are supposed to
handle the data. Each mime–type can be associated with a different set of modules and a
differing filter order. Since a sequenced order is defined, these filters form a chain called the
output filter chain.
When the Content handler is called, Apache 2.0 initiates the output filter chain. Within that
chain a filter performs actions on the data and when finished passes that data to the next
filter. That way a number of modules can work together in forming the response. One
example is a CGI content handler handing server side include tags down the module chain
so that the include module can handle them.
Apache 2 Filters are handlers for processing data of the request and the response. They have
a common interface and are interchangeable.
Client Request
processing
req req req req
HTTP
Socket CORE SSL_OUT DEFLATE
HEADER
Output Filters
In figure 3.6 you see two example filter chains: The input filter chain to process the data
of the request and the output filter chain to process the data of the response (provided by
the content handler). The agent “Request processing” triggers the input filter chain while
reading the request. An important use of the input filter chain is the SSL module providing
secure HTTP (HTTPS) communication.
The output filter chain is triggered by the content handler. In our example, the Deflate
output filter compresses the resource depending on its type.
To improve performance, filters work independently by splitting the data into buckets and
brigades (see figure 3.7) and just handing over references to the buckets instead of writing all
data to the next filter’s input (see figure 3.8). Each request or response is split up into several
brigades. Each brigade consists of a number of buckets.One filter handles one bucket at a
time and when finished hands the bucket on to the next filter. Still the order in which the
filters hand on the data is kept intact.
FMC
3.3. EXTENDING APACHE: APACHE MODULES 41
Brigade
Bucket Bucket Bucket Bucket Bucket
Figure 3.8: Apache Filters: A Filter chain using Brigades for data transport
Besides separating filters into input and output filters, 3 different categories can be distin-
guished:
During start–up or restart, the Apache master server reads and processes the configuration
files. Each modules can provide a set of configuration directives. The configuration proces-
sor of the core will call the associated command handler every time it encounters a directive
belonging to a module.To prepare resources for storing configuration data, a module can
register handlers for the following hooks:
For more information about Apache configuration and the configuration processor, consult
sections 3.2.1 and 4.5.
Apache is a multitasking server. During start–up and restart, there is only one task per-
forming initialization and reading configuration. Then it starts spawning child server tasks
which will do the actual HTTP request processing. Depending on the multiprocessing strat-
egy chosen, there may be a need for another initialization phase for each child server to
access resources needed for proper operation, for example connect to a database. If a child
server terminates during restart or shutdown, it must be given the opportunity to release its
resources.
pre_config
This hook is triggered after the server configuration has been read, but before it
has been processed during start–up and during the restart loop (see also figure
4.16 on page 86). The pre_config handler is executed by the master server task
and can take advantage of the privileges (if executed by root/administrator).
open_logs
A module can register a handler for this hook if it needs to open log files or start
a logging process. Some MPMs (see section 4.3.4) also use this hook to access
resources like server sockets.
pre_mpm (internal)
This internal hook is triggered by the master server before starting the child
servers. As it lies in the responsibility of a MPM, the core registers handlers
for the hook, for example to initialize the scoreboard.
Child_init
Any handler registered for this hook is called once for each child process just
after its creation (On the win32 platform, it is executed by the only child process
before starting the worker threads). Here a module can perform tasks needed for
every child process before starting request processing.
For more information about the multitasking server strategies, MPMs and master and child
server tasks, take a look at section 4.3 in the next chapter.
44 CHAPTER 3. THE APACHE HTTP SERVER FMC
All following handlers actually deal with request handling and are part of the request–
response loop. Figures 3.9 and 3.10 illustrate the sequential structure of that process.
Establish a connection and read the request Figure 3.9 shows the behavior of Apache
during the request–response loop. Most of the hooks shown here didn’t exist in Apache 1.3.
mod_core:
process connection
Kernel: create request_rec
Accept connection
All Modules:
create request
One Module:
create connection keep alive &&
!graceful_stop
&& !error
read headers
Kernel:
flush connection
Kernel:
close connection
pre connection
This hook allows to do any setup required just before processing, but after ac-
cepting the connection, for example registering input filters.
process connection
Run the correct protocol. The default is the HTTP protocol handler which pro-
cesses requests on the connection until a timeout occurs. Figure 3.9 shows what
happens inside the HTTP process connection handler.
FMC
3.3. EXTENDING APACHE: APACHE MODULES 45
Process a request and send a response Figure 3.10 shows how Apache processes an HTTP
request. The special case of internal requests will not be explained further.
sub request
One Module: main request
Quick Handler All Modules:
All Modules: Insert Filter
else Header Parser
No Quick Handler
handler == null
process Subrequest &&
Request URI per_dir_config unchanged
set handler
Subrequest && All Modules:
(filename != null) access checker
Parse location
config One Module:
Handler
One Module:
check UserID
One Module:
translate name
reset handler
One Module:
Status != ok
Auth checker
reset per-dir-config
Status != ok
One Module:
Type checker
One Module: close bucket-brigade
map to storage
Subrequest &&
(filename != null) All Modules:
Fixups All Modules:
Parse location Log transaction
config
quick handler
This hook is triggered before any request processing and can be used as shortcut
(see top of figure 3.10). Cache modules can use this hook.
translate name
One module can translate the Request URI to a file name or to a different resource
name.
map to storage (internal)
Determine the configuration for the requested resource, usually considering the
“directory” and “files” sections. The proxy module registers a handler for this
hook.
46 CHAPTER 3. THE APACHE HTTP SERVER FMC
header parser
Here all modules can access the header and read module specific information,
for example cookies. Since a similar task can be performed with the Post read
request hook this one is not used in the standard Apache Modules.
access checker
This hook can be used to check whether the client machine may access the re-
quested resource. Mainly this function is used to ex- or include specific IP ad-
dress spaces or user agents (browsers). All modules are involved.
check_user_id
The hook is supposed to check whether the credentials supplied by the user (if
any) are valid. Usually this means to look up user–name and password in a user
database. Only one handler is allowed to perform this task.
auth checker
Here one module can check whether a user whose identity has been checked for
a valid password in the preceding step, is authorized to access the resource he
requested. Only one handler is allowed to complete this task.
type_checker
This hook allows one handler to determine or set the MIME type of the requested
resource. The result has an impact on the selection of the content handler. The
handler or even a filter may alter the MIME type of the response afterwards.
fixups
At this step all modules can have a last chance to modify the response header
(for example set a cookie) before the calling the content handler
insert filter
This hook lets modules insert filters in the output filter chain.
handler
The hook for the content handler is the most important one in the request re-
sponse loop: It generates or reads the requested resource and sends data of the
response to the client using the output filter chain.
log transaction
Here each module gets a chance to log its messages after processing the request.
This module is discussed in detail to illustrate the structure of Apache Modules by a practical
example.
Both distributions of Apache 1.3 and 2.0 include mod_cgi. This module is used to process
CGI programs that can create dynamic web content. Due to the architectural differences be-
tween versions 1.3 and 2.0 discussed in the previous chapter the two versions of the module
are different.
FMC
3.3. EXTENDING APACHE: APACHE MODULES 47
Module Info and referenced functions Usually the module info can be found at the end
of the main source file for the specific module. In mod_cgi for Apache 1.3 the module info
contains references to 2 handlers:
module MODULE_VAR_EXPORT cgi_module =
{
STANDARD_MODULE_STUFF,
NULL, /* initializer */
NULL, /* dir config creater */
NULL, /* dir merger - default is to override */
create_cgi_config, /* server config */
merge_cgi_config, /* merge server config */
cgi_cmds, /* command table */
cgi_handlers, /* handlers */
NULL, /* filename translation */
NULL, /* check_user_id */
NULL, /* check auth */
NULL, /* check access */
NULL, /* type_checker */
NULL, /* fixups */
NULL, /* logger */
NULL, /* header parser */
NULL, /* child_init */
NULL, /* child_exit */
NULL /* post read-request */
};
The first line within the module struct references a macro called "standard_module_stuff"
which expands to the information each module has to provide. Two functions referenced in
here are create_cgi_config and merge_cgi_config. The corresponding hooks for these
handlers are create server config and merge server config. If you have a look at the
two functions you will see that the first allocates and initializes memory for configuration
data and the second merges the data stored for each virtual host with data stored for the
master server.
The references for command table and content handler do not point to functions but to
structs. The command table struct contains references to the functions used to process the
different directives that can be used for configuring mod_cgi. Within the command table
each function is referenced with the additional keyword TAKE1 which tells the core that
only one parameter is accepted.
48 CHAPTER 3. THE APACHE HTTP SERVER FMC
The struct for the content handler registers the CGI mime-type as well as the "cgi–script"
handler string with the function cgi_handler, which is the function called by the core for
the content handler. Using that struct a module can register functions for more than one
handler.
When the type_checker decided that the mod_cgi module should handle a request and
then the core calls the content handler, it actually calls the function cgi_handler.
Cgi_handler first prepares for executing a CGI by checking some pre conditions, like "Is a
valid resource requested? ". Then it creates a child process by calling ap_bspawn_child
that will execute the CGI program. Parameters for that function are among others the name
of the function to be called within the process, here cgi_child, and a child_stuff struct
that contains the whole request record. Child_cgi itself then prepares to execute the in-
terpreter for the script and calls ap_call_exec, which is a routine that takes the different
operating systems into account and uses the exec routines working for the currently used
operating system. After that all output by the script is passed back to the calling functions
until it reaches the cgi_handler handler function that then sends the data to the client
including the necessary HTTP header.
Module Info and referenced functions In the version for Apache 2.0 the module info
is much smaller. Most references to handlers for hooks are now replaced by the reference
to the function register_hooks. All handlers except the handlers for the configuration
management hooks are now dynamically registered using that function.
Even though syntax may vary, semantically the functions for configuration and the
command table perform the same actions as in Apache 1.3. Having a look at the
register_hooks function you can see an example how to influence the order to process
the handler. While the cgi_post_config function shall be called absolutely first when the
hook post_config is triggered, the cgi_handler should be called somewhere in the middle
when the content handler hook is triggered.
FMC
3.3. EXTENDING APACHE: APACHE MODULES 49
Request handling employing filters In mod_cgi for Apache 2.0, the function
cgi_handler is the start of the output filter chain. At first it behaves very much like its
Apache 1.3 pendant. It prepares to start a process to execute the CGI program. It then re-
trieves the response data from that process. Most of the execution is done in the cgi_child
function.
After it has got the response from the program, its task is to hand the just created brigade
down the filter chain. That is done at the end of the function with a call to ap_pass_brigade.
For example, it is now possible for a cgi program to output SSI (server–side includes) com-
mands which are then processed by the include module. In that context the include module
must have registered a filter that now gets the data from mod_cgi. Of course that depends
on the configuration for the corresponding MIME type.
The Apache API summarizes all possibilities to change and enhance the functionality of
the Apache web server. The whole server has been designed in a modular way so that
extending functionality means creating a new module to plug into the server. The previous
chapter covered the way in which modules should work when the server calls them. This
chapter explains how modules can successfully complete their tasks.
Basically, all the server provides is a big set of functions that a module can call. These
functions expect complex data structures as attributes and return complex data structures.
These structures are defined in the Apache sources.
Again the available features differ between the two major Versions of Apache 1.3 and
2.0. Version 2.0 basically contains all features of 1.3 and additionally includes the Apache
Portable Runtime (APR) which enhances and adds new functionality.
Apache offers functions for a variety of tasks. One major service apache offers to its modules
is memory management. Since memory management is a complex task in C and memory
holes are the hardest to find bugs in a server, Apache takes care of freeing all used memory
after a module has finished its tasks. To accomplish that, all memory has to be allocated
by the apache core. Therefore memory is organized in pools. Each pool is associated with
a task and has a corresponding lifetime. The main pools are the server, connection and
request pool. The server pool lives until the server is shut down or restarted, the connection
pool lives until the corresponding connection is closed and the request pool is created upon
arrival and destroyed after finishing a request. Any module can request any type of memory
from a pool. That way the core knows about all used memory. Once the pool has reached
the end of its lifetime, the core deallocates all memory managed by the pool. If a module
50 CHAPTER 3. THE APACHE HTTP SERVER FMC
needs memory that should even have a shorter lifetime than any of the available pools a
module can ask Apache to create a sub pool. The module can then use that pool like any
other. After the pool has served its purpose, the module can ask Apache to destroy the pool.
The advantage is that if a module forgets to destroy the sub pool, the core only has to destroy
the parent pool to destroy all sub pools.
Additionally Apache offers to take care of Array and Table management, which again makes
memory management easier. Arrays in Apache can grow in size over time and thus corre-
spond to Vectors in Java and Tables contain key/value pairs and thus are similar to hash
tables. Section 4.6 provides further informations about the pool technology.
Data types
Apache offers a variety of functions that require special parameters and return values of
special structure. Therefore it is essential to know about the Data types used by the Apache
API. Most fields contained in any of these records should not be changed directly by a mod-
ule. Apache offers API functions to manipulate those values. These functions take necessary
precautions to prevent errors and ensure compatibility in later versions.
request_rec
This is the most important data structure within Apache. It is passed to the mod-
ule’s handlers during the request–response phase. It contains all information
about the request as well as references to information and configuration data of
the main server and virtual host (server_rec) and information about the con-
nection the request belongs to (connection_rec). Each module can find the
reference to the memory pool for that request and to a variety of information
that has been gathered about that request so far. It also contains a structure that
contains various different formats of the URI from URI translation phase.
The name is somehow misleading as this data structure is also used to gather
data for the response, especially header fields.
server_rec
This structure contains information about the server, especially configuration.
Apache keeps these structures for the main server and for each virtual host.
Based upon the request a module handles, the core passes the corresponding
server structure to the module. The structure itself contains fields with infor-
mation like server name and port, timeout and keep–alive settings. Via the
server_rec a module can also access its own server–based configuration di-
rectives.
connection_rec
This structure contains information about the current connection. With HTTP 1.1
multiple requests can be submitted via one connection. Therefore a connection
can exist longer than one request as long as the server and the client support
persistent connections. Since the connection_rec also contains a memory pool,
any module dealing with a specific request can store data that is persistent dur-
ing one connection. The connection_rec can be used to determine the main
server if the connection was made to a virtual server. Various user data is also
stored here.
FMC
3.3. EXTENDING APACHE: APACHE MODULES 51
API Functions
Besides memory management Apache assists the modules in various ways. The main target
of the API is full abstraction from the operating system and server configuration. Apache
offers functions for manipulating Apache data structures and can take precautions the mod-
ule does not need to know about. The core can also perform system calls on behalf of the
module and will use the routines corresponding to the operating system currently in use.
That way each module can be used in any operating systems environment. For example the
Apache API includes functions for creating processes, opening communication channels to
external processes and sending data to the client. Additionally Apache offers functions for
common tasks like string parsing.
Within the Apache API, functions can be classified into the following groups:
• Memory Management
• Array Manipulation
• Table Manipulation
• Dynamic Linking
• Logging
For further information on the Apache 1.3 API and how to write Apache Modules see [4].
Version 2.0 of Apache introduces the Apache Portable Runtime, which adds and enhances
functionality to the Apache API. Due to the APR Apache can be considered a universal net-
work server that could be enhanced to almost any imaginable functionality. That includes
the following platform independent features:
• Network I/O
• Time
• Authentication
A goal ot the APR is to set up a common framework for network servers. Any request
processing server could be implemented using the Apache core and the APR.
Chapter 4
Inside Apache
4.1 Introduction
This chapter focuses on the implementation of Apache. You should only read it if you
Some sections contain descriptions related closely to the source code. We analyzed Apache
version 1.3.17 in the project’s first year and compared these results to Apache 2.0 during the
second year. The online version of this document provides links to locations in the source
code of both versions.
You should know the concepts of an HTTP server shown in chapter 2 and be familiar with
the characteristics of Apache depicted in chapter 3. Furthermore you should be able to read
C code.
Section 4.2 illustrates the first step of a source analysis: You need to figure out the structure
of the Apache source distribution.
A basic concept of a network server is the way it handles concurrency. Section 4.3 at first
introduces a common way to implement a multitasking server and then compares it with
the way Apache handles concurrency.
Figure 4.8 gives an overview of the behavior of the server and introduces the loops the
further sections focus on. It also covers the server’s reaction to the control commands of the
administrator (restart or end).
The master server loop shown in section 4.3.3 is closely related to the multitasking model of
Apache.
The request–response loop in section 4.4 forms the heart of every HTTP server and is there-
fore similar to the behavior of a simple HTTP server shown in figure 2.2. In this section you
will also find the “other side” of the module callbacks explained in section 3.3.
The further sections of this chapter deal with implementation details of various Apache
concepts.
53
54 CHAPTER 4. INSIDE APACHE FMC
apache_1.3.17
logs empty
Figure 4.1: Directory structure of the Apache HTTP Server 1.3.17 source distribution
The source distribution of Apache version 1.3.17 contains 780 files in 44 subdirectories; 235
files contain C source code. Figure 4.1 shows the directory structure of the Apache 1.3.17
source distribution. The most important directory is src. It contains subdirectories with
all source files of the Apache web server and most modules. For us, the most interesting
subdirectories are main and modules as they contain the central source code which is needed
to understand how Apache works.
For more details, browse through the on–line document “Structure of the Apache HTTP
Server 1.3.17 source distribution” (see sources)
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 55
httpd-2.0.45
ssl mod_ssl
proxy mod_proxy
Figure 4.2: Directory structure of the Apache HTTP Server 2.0.45 source distribution
Figure 4.2 shows the directory structure of the Apache 2.0.45 source distribution.
Compared to Version 1.3, Apache 2 is much bigger. The Apache 2.0.45 distribution includes
2165 files in 183 subdirectories; 704 files contain C source code (appr. 280 000 lines incl.
comments). The directory structure of the source distribution has also changed significantly.
Nonetheless, the parts important for us are still located in a few central directories. The
core files are now located in the directory server, with the MPMs located in the subdirectory
mpm. The directory module still contains subdirectories with all modules included in the
distribution.
More details can be found browsing the on–line document “Structure of the Apache HTTP
Server 2.0.45 source distribution” (see sources/)
enough on certain operating systems. The second major influencing factor are the use–
scenarios. Depending on how much processing is involved with a single request, how
many requests a server will have to handle and/or whether requests logically depend on
each other, certain architectures might be more advantageous than others.
Section 4.3.1 explains how a common multitasking network server architecture works and
discusses its shortcomings if used as an HTTP server. The Apache server architecture will
be shown in section 4.3.2.
Client Client
R R
Sockets
server
TCP/IP Communication Service
connection
Child Server 1
Master
config Server Files
(inetd)
Child Server n
In figure 4.3 you see the structure of this kind of multiprocessing server. At the top there
are one or many clients sending requests (R) to the server. The requests are received by the
TCP/IP Communication Service of the operating system. The Master Server has registered
itself as responsible for any request that comes in. Therefore the communication service
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 57
wakes it up. The Master Server accepts the connection request so the communication service
can establish the TCP/IP connection and create a new socket data structure.
The master server creates a child server process by doing a fork() system call. In figure 4.3
this is symbolized by the“write” arrow from the master server to the storage area enclosed
by a dashed line. The child server knows about the connection because it knows the connec-
tion socket. It can now communicate with the client until one of them closes the connection.
Meanwhile the master server waits for new requests to arrive.
With TCP/IP, a connection end point has to be identified by the IP address of the machine
and a port number (for example, the port for HTTP requests has number 80). The master
server process registers itself as a listener for the port (which in turn becomes a server port).
Note that a connection request arrives at the server port while the connection is established
using a connection port. The major difference between a server and a connection port is that
a server port is solely used to accept connections from any client. A connection port uses the
same TCP/IP portnumber but is associated with one specific connection and therefore with
one communication partner. Connection ports are used to transmit data and therefore the
server port can remain open for further connection requests.
Master Server / TCP/IP Comm.
inetd Service
Listen at Server
Ports
listen(), select()
Receive Request
at Server Port
Get Connection
for Request
accept()
establish connec-
tion at new Port
create Server
Process
(Child) Server
fork(), exec()
Communicate with
Client
Terminate
The behavior of the server is shown in figure 4.4. The system calls accept() or select()
block1 the master server process until a request comes in. accept()waits for requests on
one server port while select() is a means to observe multiple ports. In this case, after
a request has been received by the TCP/IP Communication Service, the master server can
establish the connection with the system call accept(). After that it creates a new process
1
this means the server process remains inactive until a request comes in
58 CHAPTER 4. INSIDE APACHE FMC
with fork(). If the request has to be handled by a different program, it has to be loaded
and executed with exec().
The INETD
The INETD is a typical server using the multitasking architecture described above. It waits
for requests on a set of ports defined in the configuration file /etc/inetd.conf. When-
ever a request comes in, inetd starts the (child) server program defined in the configuration
file. That program then handles the request.
Apache also provides a mode of operation to work with the inetd. In this case, the inetd is
the gatekeeper for the HTTP port (80) and starts Apache whenever an HTTP request arrives.
Apache answers the request and exits.
Drawbacks
This multiprocessing architecture is useful if the handling of the client request takes some
time or a session state has to be kept by the (child) server because the communication be-
tween client and server does not end after the response to the first request.
HTTP, however, is a stateless protocol. No session information needs to be kept by the server
— it only needs to respond to one request and can “forget” about it afterwards. An HTTP
server based on the inetd architecture would be inefficient. The master server would have
to create a process for each HTTP connection, which would handle this one connection only
and then die. While the master server creates a process it cannot accept incoming requests.
Although process creation does not take a long time on modern operating systems, this
gatekeeper function of the master server forms a bottleneck for the entire server.
The preforking architecture is based on a pool of tasks (processes or threads) which are
playing 3 different roles:
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 59
R
queue network communication service
become a worker
processing the
R
request
Leader: worker
#1: Leader
listener for A
wake up and
become the worker
Followers: Workers
listener Idle worker queue for B
processing data
requests
idle
...
#2: Follower
worker
worker
for X
idle
#3
worker
...
...
idle
#n: Last
worker
queue in and sleep
Network Server
Figure 4.5: The leader–followers pattern used in the preforking server architecture
A description of the pattern can be found in [5]. Figure 4.5 shows the structure of the system:
The listener is the leader. Only one task can be granted the right to wait for connection
requests. If the listener gets a request, it hands over the right to listen and switches his
role to worker, that means he processes the request using the connection he established as
listener. If he’s finished processing the request, he closes the connection and becomes an idle
worker. That means he queues in waiting to become the listener. Usually an idle worker task
will be suspended.
What are the differences between the server strategy described in section 4.3.1 and the
leader–follower strategy? Firstly, an incoming request will be treated immediately by the
listener task — no new task has to be created. On the other hand there should always be a
certain number of idle worker tasks to make sure there is always a listener. Secondly, there is
no need to pass information about a request to another task because the listener just switches
his role and keeps the information.
60 CHAPTER 4. INSIDE APACHE FMC
The task pool must be created at server start. The number of tasks in the pool should be big
enough to ensure quick server response, but a machine has resource restriction. The solution
is to control the number of tasks in the pool by another agent: the master server.
Preforking Architecture
The Preforking architecture was the first multitasking architecture of Apache. In Apache 2.0
it is still the default MPM for Unix. The Netware MPM very closely resembles the Prefork-
ing functionality with the exception that it uses Netware threads instead of unix processes.
Summarizingly the Preforking architecture of Apache takes a conventional Approach as
each child server is a process by itself. That makes Preforking a stable architecture but also
reduces performance.
Client Client
R HTTP R
Files
R
Child
shutdown: Server 1 Documents
Master pipe of
Server death +
signals Child
global Scripts
Server n
config.
Scoreboard
server status generation ...
gener- shared
accept memory
ation
mutex
The structure diagram in figure 4.6 shows the structure of the Preforking architecture of
Apache 2.0 and is important for the description of the behavior of Apache. You can see
which component is able to communicate with which other component and which storage
a component can read or modify. The block diagram for the Apache 2.0 MPM version of
the Preforking architecture very much resembles the version that was used on Apache 1.3,
however there is one difference: The Master Server uses a “pipe of death” instead of signals
to shut down the child servers for a (graceful) restart.
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 61
The Preforking architecture shown in figure 4.6 seems to be very similar to the inetd archi-
tecture in figure 4.3 at first sight. There is one master server and multiple child servers. One
big difference is the fact that the child server processes exist before a request comes in. As the
master server uses the fork() system call to create processes and does this before the first
request comes in, it is called a preforking server. The master server doesn’t wait for incoming
requests at all — the existing child servers wait and then handle the request directly.
The master server creates a set of idle child server processes, which register with the TCP/IP
communication service to get the next request. The first child server getting a connection
handles the request, sends the response and waits for the next request. The master server
adjusts the number of idle child server processes within given bounds.
General Behavior
Figure 4.7 shows the overall behavior of the server, including the master server and the child
servers.
Master Server Child Servers
First-time
initialization
master / restart initialization
Init Memory
proclaim new generation,
read read configuration
configuration
graceful restart
detach process
init scoreboard
request-response loop
startup child servers &
register in scoreboard keep-alive loop
create child server
processes
restart
loop wait for child's
death or time-out child server
initialization
shutdown
master
server loop wait for connection
adjust number of request
idle child server wait for request
process request
killed update scoreboard
or new timeout
shutdown or restart graceful restart generation close connection
clean-up update scoreboard
kill all child kill idle child
free resources server server master child server
clean-up
clean-up
• First–time initialization:
Allocate resources, read and check configuration, become a daemon.
Figure 4.8 shows the behavior of Apache using the preforking architecture in greater detail.
As each multitasking architecture distinguishes itself from others by using different means
to create and organize child servers, the behaviour of different multitasking architectures
mainly differs when child servers, also called workers are created during the restart loop
and within the master server loop when the workers are monitored and replaced.
Initialization The server structure shown in figure 4.6 has to be set up at start–up (start
processes, create sockets and pipes, allocate memory) and destroyed at shutdown. This is
called activation and deactivation.
There are three types of initializations:
Apache 2.0 starts with main(). After entering the Restart Loop, it calls the configured MPM
using ap_mpm_run() . (Apache 1.3 using Preforking starts with the procedure REALMAIN().
) The following comments explain the operations shown in figure 4.8:
• create static pools: Apache initializes memory areas in its own memory management
(pool management, see section 4.6)
• register information about prelinked modules: The biggest part of the HTTP server
functionality is located in the modules (see section 3.3 for further details). Modules
can either be included in the apache binary or loaded dynamically. Even if they are
included in the binary (prelinked), they have to be registered.
• read command line and set parameters: The administrator can override defaults or
config file configuration data with command line parameters. The command line pa-
rameter –X enforces the ’register one process mode’ and can be used for debugging
purposes. It prevents the creation of child server processes. If no child server pro-
cesses exist, there is no need for a master server — the one existing process enters the
request–response loop and behaves like a single child server.
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 63
restart_pending := false
shutdown_pending := false
child_main
init child_main
exclusively wait for init
request on any wait for children's exclusively wait for
server socket death or timeout request on any
handle request killed server socket
update scoreboard handle request
new
# of idle children generation update scoreboard
shutdown_
too high too low
pending
OK
kill one make some
child children
Idle server
maintenance
proclaim new
shutdown_pending || generation
(restart_pending &&
not graceful_mode) restart_pending && graceful_mode
• read per–server configuration: The master server (nothing else exists at this time) reads
the configuration files and merges the information with its configuration data. Con-
figuration data also includes information about the modules to be loaded. Note that
configuration data has to be read a second time in the restart loop!
“per–server configuration” means all static configuration data in contrast to the con-
figuration data in .htaccess files called “per–request configuration”.
• graceful_mode := false: At this time only the master server process exists, so there is
no sense in using graceful mode. (In graceful mode — see section 4.3.3 — Apache
performs a restart keeping active child servers alive.)
• detach process: Each process is usually a child process of the process that created it.
The parent process can be the shell process, for example. If the shell terminates, all
child processes of the shell process are terminated, too. Furthermore, all input and
output streams of the child process (STDIN, STDOUT, STDERR) are connected with
the shell.
Apache performs the detach after it has read the configuration data and tried to initial-
ize the modules. After the detach no error message will be printed in the shell, because
the master server has disconnected from the shell and now runs as a background task.
The detach process consists of the following actions:
The ’one_process_mode’ is useful for debugging purposes. Apache skips the detach opera-
tion and is still available for the debugger.
Restart Loop Every time the administrator forces a restart of the Apache server, it pro-
cesses the restart loop which can be found in main(). (In Apache 1.3, the restart loop is
located in the procedure standalone_main().) After reading the configuration, it calls
ap_mpm_run() of the Preforking MPM.
The loop has the following parts:
1. initialize and prepare resources for new child servers, read and process configuration
files
3. Master server: observe child servers (Master Server Loop, see section 4.3.3).
Child servers: Handle HTTP requests (Request–Response Loop, see section 4.4).
4. kill child servers (graceful restart: kill idle child servers only)
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 65
The configuration files are read by the master server only. The child servers get their con-
figuration data when they are created by the master server. Whenever the administrator
wants to apply configuration changes to the server, he has to enforce a restart. A graceful
restart allows child server processes to complete their processing of current requests before
being replaced by servers of the new generation. Each child server updates its status in the
scoreboard (a shared memory area) and compares its own generation ID with the global
generation ID, whenever it completes request handling.
• read per–server configuration: The master server reads and processes the configura-
tion files. At this time only the master server (and maybe some non–idle child servers
of the old generation) exist.
• set up server sockets for listening: Apache can listen on many ports. It is important
not to close the server sockets during restart.
• init scoreboard: In case a graceful restart is processed, the scoreboard entries for the
remaining child servers must be kept. Otherwise there are no child servers and the
scoreboard can be initialized.
• one_process_mode: This mode is used for debugging purposes (see also detach). The
master server becomes child server and enters the request–response loop.
• startup children & register them in the scoreboard: The master server creates child
server processes with the procedure startup_children(). It uses the fork() system
call. As a consequence, all child server processes get a copy of the memory imprint of
the master server and its system resources. Therefore they “know” the configuration
data and have access to the TCP/IP sockets and the log file sockets.
If Apache is started by the super user (root), the master server process is the only
process using the root User ID. The child server processes initialize, set their user ID to
a non–privileged account like “nobody” or “wwwrun” and enter the request–response
loop.
The master server creates an entry in the scoreboard for every child server including
its process ID and generation ID.
• Master server loop: (see section 4.3.3 and figure 4.9) At the beginning of the loop the
master server waits a certain time or receives the notification that a child server has
died. Then it counts the number of idle child servers and regulates their number by
creating or killing one.
• proclaim new generation: Each time the master server processes the restart loop, it
increments the generation ID. All child servers it creates have this generation ID in
their scoreboard entry. Whenever a child server completes the handling of a request, it
checks its generation ID against the global generation ID. If they don’t match, it exits
the request–response loop and terminates.
This behavior is important for the graceful restart.
66 CHAPTER 4. INSIDE APACHE FMC
• finish all/idle children: Both shutdown and restart result in the death of all child
servers. When the administrator requests a graceful restart, only the idle child servers
are killed.
• free resources: Apache returns the occupied resources to the system: Memory, TCP/IP
ports and file handles.
Apache is controlled by signals. Signals are a kind of software interrupts. They can occur
at any time during program execution.The processor stops normal program execution and
processes the signal handler procedure. If none is defined in the current program, the default
handler is used which usually terminates the current program. After the execution of the
signal handler the processor returns to normal execution unless the program was terminated
by the signal.
Administrator controls the master server The administrator can send signals directly (us-
ing kill at the shell command prompt) or with the help of a script. The master server reacts
to three signals:
The signal handlers for the master server are registered in the procedure set_signals().
It registers the signal handler procedures sig_term() and restart() .
In the upper right corner of figure 4.8 you see a small petri net describing the behavior of
the signal handler of the master server. Apache activates and deactivates signal handling at
certain points in initialization and in the restart loop. This is not shown in figure 4.8.
Master Server controls the child servers While the administrator controls the master
server by signals only, the master server uses either signals or a pipe to control the num-
ber of child servers, the Pipe of Death. (Apache 1.3 used signals only).
For a shutdown or non–graceful restart, the master server sends a SIGHUP signal to the pro-
cess group. The operating system “distributes” the signals to all child processes belonging to
the group (all child processes created by the master server process). The master server then
“reclaims” the notification about the termination of all child servers. If not all child processes
have terminated yet, it uses increasingly stronger means to terminate the processes.
A graceful restart should affect primarily the idle child server processes. While Apache
1.3 just sent a SIGUSR1 signal2 to the process group, Apache 2 puts “Char of Death” items
2
The Apache 1.3 child server’s reaction to a SIGUSR1 signal: Terminate if idle else set deferred_die and
terminate later. (See signal handler usr1_handler() registered at child server initialization.)
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 67
into the Pipe of Death (pod). The busy child servers will check the pipe after processing a
request, even before comparing their generation. In both cases they set the die_now flag and
terminate upon beginning a new iteration in the request–response loop
Table 4.1 lists the differences between a normal and a Graceful Restart .
Pipe of Death (PoD) The Master server of Apache 2.0 uses the Pipe of Death for inter–
process communication with the child servers to terminate supernumerary ones and during
graceful restart. All child servers of one generation share a pipe.
If the master server puts a Char of Death in the queue using ap_mpm_pod_signal() or
sends CoD to all child servers with ap_mpm_pod_killpg(), these procedures also create
a connection request to the listening port using the procedure dummy_connection() and
terminate the connection immediately. The child server waiting for new incoming connec-
tion requests (the listener) will accept the request and skip processing the request as the
connection is already terminated by the client. After that it checks the PoD which causes
him to terminate. Busy child servers can continue serving their current connection without
being interrupted.
Overview In this loop the master server on the one hand controls the number of idle child
servers, on the other hand replaces the child servers it just killed while performing a graceful
restart.
68 CHAPTER 4. INSIDE APACHE FMC
time-out
process child
status
slot := find child by
rem.child2start > 0
pid
rem.child2start
=0 slot found slot = 0
restart p.
or
shutdown p.
child status := reap other
SERVER DEAD child(pid)
rem.child2start > 0
success? / graceful?
& slot < limit ?
no yes else no / yes
perform_idle_server_maintenance
for all slots in
Scoreboard do:
not shown: check status of
server (slot)
Exponential Mode
(Regulation of the number
of servers to start in one
loop turn using DEAD STARTING, READY
idle_spawn_rate),
else
Warnings if the number of
idle servers is too low
incr. idle_count,
to_kill := slot
Optimize timeouts
add slot to
graceful free_slots list
shutdown restart
restart
increment
send SIGTERM send SIGHUP to total_non_dead
to proc. grp. process group
idle_count?
> ap daemons max free < ap daemons min free
reclaim child
processes(): write CoD in Pipe
for all free_slots do:
For all child of Death (pod)
processes do: write CoDs in
wait() Pipe of Death
make child(slot)
While the restart loop can be found within the server core in main(), the master server
loop it is located within the corresponding MPM, which in this case is the Preforking:
ap_mpm_run() . (In Apache 1.3 it can be found in the procedure standalone_main()
in the file http_main.c .)
In figure 4.9 you see the details of the loop. The upper part deals with the reaction to
the death of a child process and special issues of a graceful restart. The lower part is
labeled “perform idle server maintenance”. It shows a loop in which the master server
counts the number of idle servers and gets a list of free entries in the scoreboard. It com-
pares the number of idle children (idle_count) with the limits given in the configura-
tion (ap_daemons_max_free and ap_daemons_min_free). If there are too many idle
servers, it kills exactly one of them (the last idle server in the scoreboard). If the number
of idle child servers is too low, the master server creates as many child server processes as
needed (see exponential mode below).
Graceful Restart — Reaction to the death of a child process The following remarks
mainly deal with the graceful restart and the reaction to the death of a child process:
• pid := wait or timeout: The wait() system call is used to wait for the termi-
nation of a child process created with fork(). After waiting for a given period of
time, the master server continues execution of the master server loop even if it has not
received a termination notification.
– process_child_status: Get the reason for the death of the child process
– find_child_by_pid: Look for the scoreboard entry
– entry (slot) found: set child status to SERVER DEAD. If
remaining_children_to_start is not zero, create a new child server
to replace the dead child server.
– entry not found: Check if this child process has been an “other
3
child” (reap_other_child(), see below). If it is neither an “other child”
3
In some cases the master server has to create child processes that are not child server processes. They are
registered as“other child” in a separate list. An example: Instead of writing log data to files, Apache can stream
the data to a given program. The master server has to create a process for the program and connect its STDIN
stream with its logging stream. This process is an“other child”. Whenever the server is restarted, the logging
process gets a SIGHUP or SIGUSR1. Either it terminates and has to be re–created by the according module
(the logging module in the example) or it stays alive. The module must check the“other child” list to find out
if it has to create or re–use a process.
70 CHAPTER 4. INSIDE APACHE FMC
nor a scoreboard entry matches, and if graceful mode is set, then the following
situation must have happened:
The administrator has reduced the number of allowed child servers and forced a
graceful restart. A child server process that has been busy had a slot greater than
the allowed number. Now it terminates, but its entry can not be found in the
scoreboard.
Performing Idle Server Maintenance The lower part of figure 4.9 shows the behavior
of the procedure perform_idle_server_maintenance() which is called whenever
a time–out occurred and the graceful restart has been finished.
The master server counts the number of idle servers and the number of remaining slots
(entries) in the scoreboard and compares it with three limits given in the configuration:
ap_daemons_limit maximum number of child servers. The sum of busy and idle child
servers and free slots or just the number of slots of the scoreboard.
Exponential mode: Some operating systems may slow down if too many child processes are
created within a short period. Therefore the master server does not immediately create the
needed number of child servers with make_child(). It creates one in the first loop, two
in the second, four in the third and so on. It holds the number of child servers to be created
in the next loop in the variable idle_spawn_rate and increments it with every turn until
the number of idle child servers is within the limit.
Example: ap_daemons_min_free is set to 5 but suddenly there is only 1 idle server left. The
master server creates one child server and waits again. 2 idle servers are still not enough,
so the master creates 2 more child servers and waits again. In the meantime, a new request
occupies one of the new child servers. The master server now counts 3 idle child servers
and creates 4 new ones. After the time–out it counts 7 idle child servers and resets the
idle_spawn_rate to 1.
The Child Servers sometimes referred to as a workers form the heart of the HTTP Server
as they are responsible for handling requests. While the multitasking architecture is not
responsible for handling requests it is still responsible for creating child servers, initializing
them, maintaining them and relaying incoming connections to them.
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 71
Initialization, Configuration and Server restarts The master server creates child server
processes using the fork() system call. Processes have separate memory areas and are
not allowed to read or write into another processes’ memory. It is a good idea to process
the configuration once by the master server than by each child server. The configuration
could be stored in a shared memory area which could be read by every child server process.
As not every platform offers shared memory, the master server processes the configuration
files before it creates the child server processes. The child server processes are clones of the
master server process and therefore have the same configuration information which they
never change.
Whenever the administrator wants to apply changes to the server configuration, he has to
advice the master server to read and process the new configuration data. The existing child
server processes have the old configuration and must be replaced by new processes. To
avoid interrupting the processing of HTTP requests, Apache offers the “graceful restart”
mode (see section 4.3.3), which allows child servers to use the old configuration until they
have finished processing their request.
The initialization of a child server can be found in the corresponding MPM (Preforking:
child_main(), Apache 1.3: child_main()). It consists of the following steps (see also
figure 4.13):
• establish access to resources: The child server process has just been created by the mas-
ter server using fork(). At this time the child server process has the same privileges
as the master. This is important if Apache has been started by the super user (root).
Before the child server sets its user ID to a non–privileged user, it must get access to
common resources:
• Set up time–out handling: To avoid infinite blocking of the child server, Apache uses a
time–out for the request handling. It uses alarms, a concept similar to signals. It is like
setting an alarm clock to a given time and leaving the processing of the request when
the “alarm bell rings”. This is done using the concept of “long jump”.
• set status := ready in the scoreboard except after a new generation has been an-
nounced.
72 CHAPTER 4. INSIDE APACHE FMC
After having received a connection, the child server releases the accept mutex and processes
the request — it becomes a worker and lets the next process wait for a request. This is
usually called the Leader–Follower pattern: The listener is the leader, the idle workers are
the followers (see figure 4.5). As Apache uses operating system dependend techniques for
the mutex, it is possible depending on the operating system that all currently blocked child
servers are woken when one child servers returns the mutex after receiving a connection. If
so, excessive scheduling caused unnecessarily as only one of the woken child servers will
get the mutex, the others will be blocked and therefore return to sleep. That is a problem
which is addressed by the Leader MPM where followers are organized in a way such that
only one of them is woken when the accept mutex is returned.
Once a connection is received by a child server, the scope of responsibility of the multitask-
ing architecture ends. The child server calls the request handling routine which is equally
used by any multitasking architecture.
Accept Mutex vs. select() In an inetd server (see section 4.3.1), there is only one process
waiting for a TCP/IP connection request. Within the Apache HTTP server, there are possibly
hundreds of idle child servers concurrently waiting for a connection request on more than
one server port. This can cause severe problems on some operating systems.
Example: Apache has been configured to listen to the ports 80, 1080 and 8080. 10 Idle child
server processes wait for incoming TCP/IP connection requests using the blocking system
call select() (they are inactive until the status of one of the ports changes). Now a con-
nection request for port 1080 comes in. 10 child server processes wake up, check the port
that caused them to wake up and try to establish the connection with accept(). The first is
successful and processes the request, while 9 child servers keep waiting for a connection at
port 1080 and none at port 80 and 8080! (This worst–case scenario is only true for blocking5
accept())
4
A mutex is a semaphore used to enforce mutual exclusion for the access to a resource. A semaphore is a
means for inter–process communication (IPC): A process can increment or decrement the value of a semaphore.
The process is suspended (blocked) if it tries to decrement and the value is zero, until another process incre-
ments the value of the semaphore. To implement a mutex with a semaphore, you have to set the maximum
value to 1.
5
“blocking” means: Any process calling accept() is suspended. If a connection request arrives, the first
resumes its operation.
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 73
Therefore in a scenario where there are multiple child servers waiting to service multiple
ports the select() accept() pair is not sufficient to achieve mutual exclusion between the mul-
tiple workers. Therefore Preforking has to use the accept mutex.
In general it is a bad idea to waste resources of the operating system to handle concurrency.
As some operating systems can’t queue the child server processes waiting for a connection
request, Apache has to do it.
A multiprocessing architecture’s main task is to provide a fast responding server which uses
the underlying operating system efficiently. Usually each architecture has to accept a trade–
off between stability and performance.
In case of a HTTP server, the multitasking architecture describes the strategy how to create
and control the worker tasks and how they get a request to process.
The first choice concerns the tasks: Depending on the platform, the server can use processes
or threads or both to implement the tasks. Processes have a larger context (for example
the process’ memory) that affects the time needed to switch between processes. Threads
are more lightweight because they share most of the context, unfortunately bad code can
corrupt other thread’s memory or more worse crash all threads of the process.
The next aspect affects the way how tasks communicate (Inter Task Communication). In
general, this can be done by shared memory, signals or events, semaphores or mutex and
pipes and sockets.
As all MPMs use a Task Pool strategy (idle worker tasks remain suspended until a request
comes in which can immediately be processed by an idle worker task), there must be a
means to suspend all idle worker tasks and wake up one whenever a request occurs. For
this, an operating system mechanism like a conditional variable or a semaphore must be
used to implement a mutex. The tasks are suspended when calling a blocking procedure to
get the mutex.
The server sockets are a limited resource, therefore there can only be one listener per socket
or one listener at all. Either there are dedicated listener tasks that have to use a job queue
to hand over request data to the worker tasks, or all server tasks play both roles: One idle
worker becomes the listener, receives a request and becomes a worker processing the re-
quest.
Finally a task can control the task pool by adjusting the number of idle worker tasks within
a given limit.
Apache includes a variety of multitasking architectures. Originally Apache supported dif-
ferent architectures only to support different operating systems. Apache 1.3 had two major
architectures which had to be defined at compile time using environment variables that the
precompiler used to execute macros which in turn selected the correspondig code for the
operating system used:
run open_logs-Hook
initialization
restart
wait for child's run child_init-Hook
loop wait for connection
death or time-out child server
initialization request
master
server loop
shutdown adjust number of
idle child server wait for request
process request
killed
or new update scoreboard
generation timeout
It is the MPM’s responsibility to take care of starting threads and/or processes as needed.
The MPM will also be responsible for listening on the sockets for incoming requests. When
requests arrive, the MPM will distribute them among the created threads and/or processes.
These will then run the standard Apache request handling procedures. When restarting or
shutting down, the MPM will hand back to the main server. Therefore all server function-
ality is still the same for any MPM, but the multiprocessing model is exchangeable. Figure
4.10 shows the responsibility of an Apache MPM in the overall behavior of Apache (see also
figure 4.7). The dotted line marks the actions for which the MPM takes responsibility.
Version 2.0 currently includes the following MPMs:
• Preforking and Netware — MPMs that resemble the functionality of the Preforking
architecture of Apache 1.3
FMC
4.3. MULTITASKING SERVER ARCHITECTURES 75
• WinNT — Apache 1.3’s Win32 version was similar to this, however the WinNT MPM
is enhanced by the IOCP operating system concept
• Worker — A new MPM that makes use of both processes and threads and performs
better than Preforking
• Leader and PerChild — Two MPM’s still in an experimental state an alternative to
Preforking and Worker on Linux based Systems
The initialization procedure of the Win32 multitasking architecture closely resembles the
one described for the Preforking architecture. All initialization is similar to the point where
the Preforking MPM is called or the Apache 1.3 architecture starts to create child server
processes.
Both, the 1.3 and the 2.0 version use the master server process as the supervisor process. That
in turn creates the second process that contains all the worker threads. When started the
worker process only contains a single master thread which then spawns the fixed number
of worker threads. These correspond to the child server processes within the Preforking
architecture. The Windows multitasking version uses a fixed number of threads since idle
threads impose almost no performance issue. Therefore as many threads as are desirable for
optimum performance are started right away and the server can be used to its maximum
capability, without the overhead and delay of spawning new processes dynamically.
Both Windows multitasking architectures only support graceful restart or shutdown.
client client
R
HTTP
Files
R
Master R startup
listener
Server & shut- Documents
Process down worker 1
Admin listener
con-
troller job queue
(IOCP)
worker N Scripts
Files start (child_
global mutex main) listener
local
config.
config. data
data
(.htacess)
Scoreboard
configuration data server status ...
max. requ.
exit flag
per child
for communication between the supervisor or master server process and the worker process.
Here events are used for signaling:
• The master server process can signal the worker process that a shutdown or graceful
restart is in progress.
• On the one hand the worker process can signal the master server process that it needs
to be restarted or that a serious error occurred that requires to shutdown the server.
The worker process itself uses various means for communication with the listener(s) and
the worker threads. When a shutdown event occurs the master thread puts “die”– jobs into
the job queue or the IOCP used. Thus idle and sleeping worker threads are woken and exit,
while worker threads that are busy handling a request can complete the request and quit
later. Additionally it sets various exit flags that can be read by the listener(s) as well as the
worker threads.
However job queue or IOCP respectively are also used by the listeners to communicate
arriving connections to the worker threads for request handling.
The master server process is called the supervisor process when entering the restart loop. It
contains only one thread and is used to monitor the second process called the worker process
to be able to restart it in case it crashes. The user can communicate with this specific process
using the control manager which can be found on any windows platform. The control man-
ager then sends an event to the server which signals a restart or shutdown. Additionally the
apache server supplies command line options that can be used for signaling.
The worker process contains three kinds of threads: One master thread, a fixed number of
worker threads and one or multiple listeners. The master starts one or multiple listeners
which accept the connection requests and put the connection data into a job queue (like a
gatekeeper). The worker threads fetch the connection data from the queue and then read
and handle the request by calling the core’s request handling routine which is used by all
multitasking architectures. The communication between the master and the worker threads
is also accomplished via the job queue. However the only communication necessary be-
tween master and worker thread is to signal a worker to exit. If the master thread wants
to decrease the number of worker threads due to a pending shutdown or restart, it puts
"die"–jobs into the queue.
Instead of a selfmade job queue, the MPM of Version 2.0 uses the IOCP on Windows NT
plattforms. The advantage of the I/O Completion Port is that it enables the server to specify
an upper limit of active threads. All worker threads registering with the IOCP are put to
sleep as if registering with a job queue. When any of the events the IOCP is responsible
for occurs one worker thread is woken to handle the event (in Apache that can only be a
new incomming connection request). If however the limit of active threads is exceeded, no
threads are woken to handle new requests until another thread blocks on a synchronous call
or reregisters with the IOCP. That technique is used to prevent excessive context switching
and paging due to large numbers of active threads.
78 CHAPTER 4. INSIDE APACHE FMC
Worker Threads are kept pretty simple in this architecture model. As they can share any
memory with their parent process (the worker process) they do not need to initialize a lot
of memory. All they maintain is a counter of requests that the thread processed. The MPM
version also keeps a variable containing the current operating system version so that either
the job queue or the IOCP is choosen when trying to get a connection. Therefore the initial-
ization is very short.
After intialization the worker registers with the IOCP or the job queue to retrieve a connec-
tion which it can handle. After receiving the connection it calls the core’s request processing
routine. It continues to do that until it is given a “die”–job from the queue, which would
cause it to exit.
The Worker MPM is a Multiprocessing Model for the Linux/Unix Operating System Plat-
form. In contrast to the Preforking and WinNT Model, this MPM uses a combination of a
multiprocessing and a multithreading model: It uses a variable number of processes, which
include a fixed number of threads (see figure 4.12). The preforking model on process level
is extended by a job queue model on thread level.
HTTP
R
job queue
starter worker 1 Documents
Admin Master
Server remove
Listener job
Process
idle worker
queue
worker N Scripts
Files
pipe of
global death child_main
local
config.
config. data
data
(.htacess)
exit flag configuration
conf.
Scoreboard
server status generation ...
generation shared memory
accept mutex
Still the master server process adjusts the number of idle processes within a given range
based on server load and the max_child, max_idle and min_idle configuration directive.
Each child process incorporates a listener thread, which listens on all ports in use by the
server. Multiple processes and therefore multiple listener threads are mutually excluded
using the accept mutex like in the Preforking Model.
Initialization of a child server is a more complex task in this case, as a child server is a more
complex structure. First the master server creates the child process, which in turn starts a so
called starter thread that has to set up all worker threads and a single listener thread. This
behavior is reflected in figure 4.12.
Within each child process, the communication between the listener and all worker threads
is organized with two queues, the job queue and the idle queue. A listener thread will only
apply for the accept mutex if it finds a token in the idle queue indicating that at least one
idle worker thread is waiting to process a request. If the listener gets the mutex, it waits for
a new request and puts a job item into the queue after releasing the accept mutex. Thus it is
ensured that a incoming request can be served by a worker immedialtely.
After completing a request or a connection with multiple requests (see section 2.3.4 for de-
tails) the worker thread registers as idle by putting a token into the idle queue and returns
to wait for a new item in the worker queue.
Advantages of this approach are that it combines the stable concept of multiprocessing with
the increased performance of a multithreading concept. In case of a crash, only the process
that crashed is affected. In multithreading a crashing thread can affect all threads belonging
to the same parent process. Still threads are a lot more lightweight and therefore cause less
performance overhead during start–up and consume less memory while running.
The MPMs mentioned so far are the MPMs used most often. Additionally there are other
MPMs available. However most of these mainly serve an experimental purpose and are
seldom used in productive environments.
Leader MPM
This MPM uses the preforking (Leader–Follower, see also figure 4.5 and the pattern descrip-
tion in [5]) model on both process and thread level using a sophisticated mechanism for the
followers queue:
Each Child Process has a fixed number of threads like in Worker MPM. However, threads are
not distinguished into worker and listener threads. Idle workers are put onto a stack. The
topmost worker is made listener and will upon receiving a connection immediately become
a worker to handle the connection itself. The worker on the stack below him will become the
new listener and handle the next request. Upon finishing a request the worker will return to
the top of the stack.
80 CHAPTER 4. INSIDE APACHE FMC
This approach addresses two performance issues. First there is no delay due to handing
a connection to a different task using a job queue, since each thread simply handles the
connection it accepts. Secondly since follower threads are organized in a stack, only one
thread is woken when the listener position becomes available. The overhead that is caused
when all threads are woken to compete for the mutex is avoided.
A thread returning to the stack is put on top. Therefore it is most likely that a thread on top
will handle more requests than a thread at the bottom. Considering the paging techniques
for virtual memory that most operating systems use, paging is reduced as more often used
threads do more work and thus are less likely to be swapped to the hard disk.
Per–Child MPM
Based on the Worker MPM, this experimental MPM uses a fixed number of processes, which
in turn have a variable number of threads. This MPM uses also uses the preforking model
on both process and thread level. The advantage of this approach is that no new processes
have to be started or killed for load balancing.
An advantage of this MPM: Each process can have a separate UserID, which in turn is as-
sociated with different file access and program execution rights. This is used to implement
virtual hosts with different rights for different users. Here each virtual host can have its own
process, which is equipped with the rights for the corresponding owner, and still the server
is able to react to a changing server load by creating or destroying worker threads.
4.4.1 Overview
The Request–Response Loop is the heart of the HTTP Server. Every Apache child server
processes this loop until it dies either because it was asked to exit by the master server or
because it realized that its generation is obsolete.
Figure 4.13 shows the request–response loop and the keep–alive loop. To be exact, the
request–response loop deals with waiting for and accepting connection requests while the
keep–alive loop deals with receiving and responding to HTTP requests on that connection.
Depending on the multitasking architecture used, either each idle worker tries to become
listener, or it waits for a job in a job queue. In both cases it will be suspended until the mutex
or the job queue indicates either that it will be the next listener or that a new job is in the
queue.
The transitions in the rectangle “wait for TCP request” in figure 4.13 show the Leader–
Follower model of the Preforking MPM: The child server task tries to get the accept mutex to
become listener and will be suspended again until a TCP connection request comes in. It ac-
cepts the connection and releases the accept mutex. (see also child_main() in prefork.c).
FMC
4.4. THE REQUEST–RESPONSE LOOP 81
max_requests_per_child
allocate bucket
# of server ports?
>1 =1
keep_alive
destroy request
pool
status :=
busy keepalive
close connection
After a child server received a connection request, it leaves the scope of the MPM and trig-
gers the hooks pre_connection and process_connection. The module http_core.c
registers the handler ap_process_http_connection() for the latter hook which reads and
processes the request.
An HTTP client, for example a browser, re–uses an existing TCP connection for a sequence
of requests. An HTML document with 5 images results in a sequence of 6 HTTP requests
that can use the same TCP connection. The TCP connection is closed after a time–out period
(usually 15 seconds). As the HTTP header used to control the connection had the value
“keep–alive”, the loop carries this name.
The keep–alive loop for reading and processing HTTP requests is specific for
HTTP. Therefore in Apache 2, the module http_core.c registers the handler
ap_process_http_connection() which includes the keep–alive loop. Similar to the tran-
sient pool, the request pool is cleared with every run of the keep–alive loop.
The child server reads the request header (the request body will be treated by the cor-
responding content handler) with the procedure ap_read_request() in protocol.c. It
stores the result of the parsing in the data structure request_rec. There are many errors
that can occur in this phase6 . Note that only the header of the HTTP request is read at this
time!
After the HTTP header has been read, the child server status changes to “busy_write”. Now
it’s time to respond to the request.
Figure 4.14 shows the details of the request processing in Apache 2.0. Request processing
in Apache 1.3 is almost the same. The only major exception is that only a single content
handler can be used in apache, but multiple modules can take part in forming the response
in Apache 2.0 as the filter concept is used.
The procedure ap_process_request() in http_request.c calls
process_request_internal() in request.c. What happens in this procedure is
shown in figure 4.14 which is similar to figure 3.9 in section 3.3 on page 44, but provides
technical details and explanations:
• Then Apache retrieves the configuration for the Request URI: location_walk(). This
happens before a module translates the URI because it can influence the way the URI
is translated. Detailed information about Apache’s configuration management can be
found in section 4.5.
6
If you want to check this, establish a TCP/IP connection with telnet (telnet hostname port) and type
in a correct HTTP request according to HTTP/1.1.
FMC
4.4. THE REQUEST–RESPONSE LOOP 83
Read Configuration:
location_walk()
This routine gives our module an opportunity to translate the URI
into an actual filename. If we don't do anything special, the server's
default rules (Alias directives and the like) will continue to be
followed.
S translate_name
The return value is OK, DECLINED, or HTTP_mumble. If we return
OK, no further modules are called for this phase.
Read Configuration:
directory_walk()
file_walk()
S map_to_storage
location_walk()
This routine is called to give the module a chance to look at the
request headers and take any appropriate specific actions early
in the processing sequence.
A header_parser The return value is OK, DECLINED, or HTTP_mumble. If we
return OK, anyremaining modules with handlers for this phase
will still be called.
This routine is called to check for any module-specific restric-
tions placed upon the requested resource. (See the
Authorization check mod_access module for an example.)
The return value is OK, DECLINED, or HTTP_mumble. All
A access_checker modules with an handler for this phase are called regardless of
whether their predecessors return OK or DECLINED. The first
check_access(r)
auth_type(r)
_required(r)
returnvalue
some_auth
one to return any other status, however, will abort the sequence
decl_die( status,
NOP (and the request) as usual.
phase, r )
satisfies(r) = satisfies(r) = This routine is called to check the authentication information
ALL ANY CHK sent with the request (such as looking up the user in a database
and verifying that the [encrypted] password sent matches the
NULL CHK S check_user_id one in the database).
T CHK The return value is OK, DECLINED, or some HTTP_mumble
"..." NOP error (typically HTTP_UNAUTHORIZED). If we return OK, no
OK
NULL
d_d(ISE, "p a. (check result: OK) other modules are given a chance at the request during this
AT n s!", r ) && (auth_type(r) != NULL) phase.
F NOP
"..." NOP
This routine is called to check to see if the resource being
NULL S auth_checker requested requires authorisation.
T CHK The return value is OK, DECLINED, or HTTP_mumble. If we
"..."
E# d_d(E#x, return OK, no other modules are called during this phase. If *all*
x "chk_acs", r ) d_d(E#x, "p a. (check result: OK)
NULL modules return DECLINED, the request is aborted with a server
AT n s!", r ) && (auth_type(r) != NULL)
F
d_d(E#x,
error.
"..."
"chk_acs", r )
• ap_translate_name(): Some module handler must translate the request URI into a
local resource name, usually a path in the file system.
• Again it gets the pieces of configuration information for the Request URI with
location_walk() (the URI can differ from the request URI after being processed
by ap_translate_name()!). The core handler for the hook map_to_storage,
core_map_to_storage() calls ap_directory_walk() and ap_file_walk() which
collect and merge configuration information for the path and the file name of the re-
quested resource. The result is the configuration that will be given to all module han-
dlers that process this request.
(“walk”: Apache traverses the configuration information of every section of the path
or URI from the root to the leaf and merges them. The .htaccess files are read by
directory_walk()in every directory and by file_walk() in the leaf directory.)
• header_parser: Every module has the opportunity to process the header (to read
cookies for example).
2. Authorization check based on the identity of the client user (to get the identity, an
authentication check is necessary)
• the rules for the authorization check (allow/deny IP or users, either both IP and Iden-
tity check must be successful or only one of both)
The complex behavior of the authorization check could not be illustrated completely in fig-
ure 4.14. Use the matrix on the left–hand side to map the program statements to the opera-
tions.
• type_checker: get the MIME type of the requested resource. This is important for
selecting the corresponding content handler.
• fixups: Any module can write data into the response header (to set a cookie, for
example).
• handler: The module registered for the MIME type of the resource offers one or more
content handlers to create the response data (header and body). Some handlers, e.g.
the CGI module content handler, read the body of the request. The content handle
sends the response body through the output filter chain.
The error handling shown on the left side is ’misused’ if Apache wants to skip further re-
quest processing. The procedure ap_die() checks the status and sends an error message
only if an error occurred. This “misuse” happens for example if ap_translate_name is
successful (it returns “DONE”)!
For more information on filters, check section 3.3.4.
Browser
A D F I L N Q
request processor
B E G J M O R
conf
commandline parameters
administrator
After Apache has been built, the administrator can configure it at start–up via command
line parameters and global configuration files. Local configuration files (usually named
.htaccess) are processed during each request and can be modified by web authors.
86 CHAPTER 4. INSIDE APACHE FMC
Figure 4.15 shows the system structure of Apache focusing on the configuration pro-
cessor. At the bottom, we see the administrator modifying global configuration files
like httpd.conf, srm.conf, access.conf and local configuration files ht1 to ht5
(.htaccess). The administrator starts Apache passing command line parameters. The con-
fig reader then reads and processes the global configuration files and the command line
parameters and stores the result in the config data storage, which holds the internal config-
uration data structures.
For each request sent from a browser, the request processor advises the ’per request config
generator’ to generate a per–request config valid for this request. The per request config
generator has to process the .htaccess files it finds in the resource’s path and merges it
with the config data. The request processor now knows how to map the request URI to a
resource and can decide if the browser is authorized to get this resource.
read_config (initial)
init
read_config (operational)
generate per
request config
Master-Server-
Loop
shutdown
processes request
restart or shutdown
terminate
Figure 4.16 shows the situations when Apache processes configuration — the diagram is an
excerpt of figure 4.7. After the master server has read the per–server configuration, it enters
the ’Restart Loop’. It deletes the old configuration and processes the main configuration file
again as it does it on every restart.
FMC
4.5. THE CONFIGURATION PROCESSOR 87
In the ’Request–Response Loop’, a child server generates the per–request configuration for
each request. As the configuration data concerning the request is tied to the per–request
data structure request_rec, it is deleted after the request has been processed.
In the next parts, we will first take a look on the data structures that are generated when pro-
cessing the global configuration files. After that we take a look at the source code responsible
for doing this. Then we describe the processing of configuration data on requests.
1 1
module_config lookup_defaults
1 0..1
per_server_config
1
0 or
1 #Virtual_Host
core_server_conf others
#<Directory[Match]...> in VH #<Location[Match]...> in VH
sec sec_url
0..1 0..1
Module
per_directory_config
core_per_dir_conf others
1 0 or
#VH*[1+#Dir+*
#<Files[Match]...> in #Files+#Loc]
Directory-context or in
default-context
sec
0..1
Each virtual host has exactly one per_server_config containing per–server information
and one per_directory_config containing defaults for the per–directory configuration
88 CHAPTER 4. INSIDE APACHE FMC
server_rec
array [0..total_modules + DYNAMIC_MODULE_LIMIT]
server_conf module_config
so_server_conf
docum _root
access_name
loaded_
modules
sec_url
sec
core_server_config
core_dir_config
array [0.. d (directory)
#<Directory[Match]> in r (regex)
<Virtualhost>-Context] fn_match
override
sec
core_dir_config
array [0.. d (directory)
server_admin #<Files[Match]>in r (regex)
server_hostname <Directory[Match]>- fn_match
port Context] override
path
sec
is_virtual
error_log
access_confname array [0..total_modules + DYNAMIC_MODULE_LIMIT]
srm_confname
array [0..
core_dir_config
#<Location[Match]> in
d (directory)
<Virtualhost>-Context]
r (regex)
fn_match
next
override
sec
sec
core_dir_config
array [0.. d (directory)
#<Files[Match]> r (regex)
without Context] fn_match
override
sec
virtual host
<Virtualhost> LoadFile
command_rec <Directory> LoadModule
<Files>
<Location> module
AccessConfig directives
DocumentRoot
AddModule
handler_rec */*
default-handler
content
handlers
registered
core handlers top_module
module mod_so
(command line parameter -t) or if Apache runs in inetd mode. The second pass is necessary,
as Apache needs to get the actual configuration on every restart.
Apache calls the function ap_read_config() for processing the configuration when start-
ing or restarting. The function is called for the first time in main() and afterwards in the
’Restart Loop’.
ap_server_pre_read_config process_command
ap_server_post_read_config _config
cmd_parms
R
LineByLineConfiguration
Command-handler
FileProcessor
process_resource
global configuration files
_config
cmd_parms
Figure 4.19: Structure of the configuration processor (focusing on the data flow)
Figure 4.19 shows the data flow in the structure of the global configuration processor:
The agent process_command_config is responsible for reading command line parameters
from the storages ap_server_pre_read_config and ap_server_post_read_config,
while the agent process_resource_config reads the global configuration files.
Both agents pass their data to the Line–by–line configuration file processor
(ap_srm_command_loop). This is the heart of the configuration processor and it schedules
the processing of a directive to the corresponding command handlers in the modules.
Figure 4.20 shows the layering of function calls regarding configuration7 (Note: Only the
most important procedures are covered.):
ap_read_config() calls the procedures process_command_config() and
ap_process_resource_config().
process_command_config() processes directives that are passed to Apache at the com-
mand line (command line options -c or -C). The arguments are stored in the arrays
ap_server_pre_read_config and ap_server_post_read_config when reading com-
mand line options in main(), depending on if they should be processed before or after
the main configuration file. These arrays are now handled like configuration files and are
passed to the function ap_build_config() (ap_srm_command_loop() in Apache 1.3) in a
cmd_parms data structure, which contains additional information like the affected server,
memory pointers (pools) and the override information (see also figure 4.20).
ap_process_resource_config() actually processes the main configuration file. Apache
has the ability to process a directory structure of configuration files, in case a directory name
instead of a filename was given for the main configuration file. The function calls itself recur-
sively for each subdirectory and so processes the directory structure. For each file that has
7
Layer diagram of function calls: A line crossing another line horizontally in a circle means the box where
the line starts contains the name of the calling procedure, whereas the box with the line that crosses vertically
contains the name of the called one.
FMC
4.5. THE CONFIGURATION PROCESSOR 91
apache_main
standalone_main process_request
ap_read_config
directory_walk
process_command_ ap_process_
config resource_config
ap_parse_htaccess
ap_build_config()
ap_handle_command ap_cfg_getline
execute_now()
ap_find_command_
ap_set_config_vectors
in_modules
Commandhandler invoke_command
ap_get_word_conf
include_config virtualhost_section dirsection filesection urlsection
Include <Virtualhost...> <Directory[Match] ...> <Files[Match] ...> <Location[Match] ...>
been found at the recursion endpoint, a cmd_parms structure containing a handle to the con-
figuration file is initialized and passed to ap_build_config() (ap_srm_command_loop()
in Apache 1.3).
Processing a Directive
Figure 4.20 also shows some of the command handlers of the core module (Links in Brackets
show the Apache 1.3 version:
dirsection (dirsection), filesection (filesection) and urlsection (urlsection)
are the corresponding functions to the <Directory>, <Files> and <Location> directives.
Again they use ap_build_config() (ap_srm_command_loop()) to handle the directives
inside a nested section.
As an example, we take a look at how Apache processes a <Directory> directive by
invoking the command handler dirsection(). This can be seen in figure 4.21. Now
dirsection() calls for ap_build_config() (ap_srm_command_loop()) to process all di-
rectives in this nested sections line by line.
If Apache detects the directive </Directory>, it invokes the corresponding command han-
dler, which returns the found </Directory> string as an error message, so the processing of
lines is stopped and ap_build_config() (ap_srm_command_loop()) returns. If it returns
NULL it has finished processing the configuration file and has not found the corresponding
end section tag. The calling dirsection() function returns a ’missing end section’ error.
filename-string as
FMC
ap_build_config + dirsection()
transformations on get line ap_handle_command +
filename-string increment invoke_cmd
linenumber create new per dir config list
unclosed <>
filename-string =
find command in module
current entry
for each open file as current list
set cmd->endtoken
config source
S
entry no module
in list for each found
linenumber := 1
line in
next entry config
S execute_now() errmsg =
source
4.5. THE CONFIGURATION PROCESSOR
invoke_command() ap_srm_command_loop()
EOF
S
R
DECLINE_C
MD
OK or if errmsg = NULL
Error
OK
get line
increment cmd->endtoken =
linenumber old_endtoken
errmsg!=expected
endsection
ap_get_module_config()
Otherwise, it adds a per–directory configuration array to the data structures of the corre-
sponding server (virtual host). The end_nested_section() function knows for which sec-
tion end it has to look because the name is stored in the cmd_parms structure which is passed
to the command handlers.
The <VirtualHost> directive works similarly, the Include directive just calls again
ap_process_resource_config() to process an additional configuration file.
Apache reads configuration data at start–up and stores it in internal data structures. When-
ever a child server processes a request, it reads the internal configuration data and merges
the data read from the .htaccess files to get the appropriate configuration for the current
request.
Figure 4.22 presents the system structure of the configuration processor and its storages
containing internal configuration data structures for one virtual host. These configura-
tion data structures have been generated at start–up and are presented in detail in figures
4.17 and 4.18. Here, the sec, sec_url, server_rec and lookup_defaults structures are
shown. The name of the files that are to be processed on a request is also stored in the
core_server_config and is .htaccess by default. The configuration for the request is
generated in the request_rec data structure, which is represented on the right side of the
diagram and also provides other required information to the walk functions.
Invoked functions
Filesystem
.htaccess-File .htaccess-File .htaccess-File
Request
path_info
2. directory_walk
finfo
Server-Configuration (per virtual host)
Module_Config
per_dir_config
Core_Server_Config Server Config
Core-Directory-Configuration perdir-config of
access_name index filename d fnmatch RegExp r override sec-Tab.
other modules
per
.htaccess virtual
host
sec perdir-config
Core-Directory-Configuration of other per server-
config of other
index filename d fnmatch RegExp r override sec-Tab. modules
4.5. THE CONFIGURATION PROCESSOR
modules
filename
Reg.-Expr. or abs
3. file_walk
Path
sec_url server
Core-Directory-Configuration perdir-config of
other modules
index filename d fnmatch RegExp r override sec-Tab.
lookup_defaults
perdir-config
Core-Directory-Configuration
of other
index filename d fnmatch RegExp r override sec-Tab. modules
uri
1.,4. location_walk
perdir-config of
Core-Directory-Configuration other modules
index filename d fnmatch RegExp r override sec-Tab.
Figure 4.22: Structure of the per–request configuration processor: The walk procedures
95
96 CHAPTER 4. INSIDE APACHE FMC
get core_server_config
per_dir_defaults =
lookup_defaults
filename[0] != /
initialize count j < # of dir section entries
filename = uri
variables i, j
entry_config =
i < # of slashes in filename sec[j]
break if
- regex
- entry_dir[0]!=/
- # of slashes in d > i
compare entry_dir
with
- fnmatch or
- strncmp
overrides?
no yes
ap_parse_
htaccess()
merge result in
per_dir_defaults
compare with
regex entries
on a match
merge in
per_dir_defaults
r-per_dir_config =
per_dir_defaults
return ok
1. The left path is taken if there is no filename given and just sets the URI as filename.
2. The second path is the one taken if filename is not starting with a ’/’. Here,
the directory–walk just loops through the directory section entries and compares
filename to the entries.
For comparing it uses either the entry fnmatch, using a compare function specified in
POSIX, a regular expression entry or simply a string compare with filename.
It loops through the array of entries and tests each section if it matches and merges it
on a hit in per_dir_defaults.
3. The third path uses a nested loop. Here, the order of the entries is of importance (see
ap_core_reorder_directories() in http_core.c). The directory sections are or-
dered such that the 1–component sections come first, then the 2–component, and so
on, finally followed by the ’special’ sections. A section is ’special’ if it is a regex (reg-
ular expression), or if it doesn’t start with a ’/’.
The outer loop runs as long as the number of slashes in filename is larger than i (a
counter which is incremented on each pass). If i is larger the possibly matching sec-
tions are already passed.
The nested loop actually walks through the entries, memorizing its position in j. If the
actual entry is a regular expression or if the directory name is not starting with a ’/’,
the inner loop breaks because it has entered the ’special’ sections and the outer loop is
finished, too. Regular expressions are compared later on in a separate loop.
If the inner loop breaks because the number of slashes in the directory name of the
entry is larger than i, the entry is skipped and the .htaccess file of the corresponding
directory is included if allowed. Then the outer loop starts a new cycle. This way, all
relevant .htaccess files are included.
If no break occurs we are in the right place. In the inner loop the directory name of
the entry is compared with fnmatch or strncmp. On a match the result is merged in
per_dir_defaults.
The override information is applied and where a .htaccess file has the permission to
override anything, the method tries to find one.
If a .htaccess file has to be parsed, ap_parse_htaccess() is invoked. This proce-
dure in turn calls ap_build_config() (see figure 4.20), which works the same way
as at start–up for the main configuration files, but this time on the per_dir_config
structure of the request_rec.
file_walk() works only on the per_dir_config of the request_rec because the struc-
tures for the file directives are already copied from the core_server_config’s and
98 CHAPTER 4. INSIDE APACHE FMC
Apache has different built-in pools that each have a different lifetime. Figure 4.24 shows
the hierarchy of the pools. The pool pglobal exists for the entire runtime of the server.
The pchild pool has the lifetime of a child server. The pconn pool is associated with each
connection and the preq pool for each request. Usually a developer should use the pool
with the minimum acceptable lifetime for the data that is to be stored to minimize resource
usage.
If a developer needs a rather large amount of memory or other resources and cannot find a
pool with the acceptable lifetime, a sub pool to any of the other pools can be created. The
program can then use that pool like any other and can additionally destroy that pool once
it is not needed any longer. If the program forgets to destroy the pool, it will automatically
be destroyed once the Apache core destroys the parent pool. All pools are sub pools of the
main server pool pglobal. A connection pool is a sub pool of the child server pool handling
the connection and all request pools are sub pools of the corresponding connection pools.
Internal structure
Internally, a pool is organized as a linked list of sub pools, blocks, processes and callback
functions. Figure 4.25 gives a simple view of a block and an example of a linked list of
blocks.
If memory is needed, it should be allocated using predefined functions. These functions do
not just allocate the memory but also keep references to that memory to be able to deallocate
it afterwards. Similarly processes can be registered with a pool to be destroyed upon death
of the pool. Additionally each pool can hold information about functions to be called prior
to destroying all memory. That way file handlers and as such sockets can be registered with
a pool to be destroyed.
100 CHAPTER 4. INSIDE APACHE FMC
• end
• next=NULL
• first avail.
size
Termination sequence
When a pool is destroyed, Apache first calls all registered cleanup functions. All registered
file handlers and that way also sockets are closed. After that, a pool starts to terminate all
registered and running processes. When finished with that, all blocks of memory are freed.
Usually Apache does not really free the memory for use by any other program but deletes it
from the pool and adds it to a list of free memory that is kept by the core internally. That way
the costly procedure of allocating and deallocating memory can be cut down to a minimum.
The core is now able to assign already allocated memory to the instance in need. It only
needs to allocate new memory if it has used up all memory that was deallocated before.
Apache allocates memory one block at a time. A block is usually much bigger than the
memory requested by modules. Since a block always belongs to one pool, it is associated
with the pool once the first bit of memory of that block is used. Subsequent requests for
memory are satisfied by memory left over in that pool. If more memory is requested than
left over in the last block, a new block is added to the chain. The memory left over in the
previous block is not touched until the pools is destroyed and the block is added to the list
of free blocks.
Since blocks are not deallocated but added to a list of free blocks, Apache only needs to
allocate new blocks once the free ones are used up. That way the performance overhead is
heavily reduced, as the system seldom needs to be involved in memory allocation. Under
most circumstances also the amount of memory used to store the same amount of informa-
tion is smaller compared to conventional memory allocation, as Apache always assigns as
much memory as needed without a lower limit. The size of a block can be configured using
the Apache configuration.
Since each pool handles the cleanup of resources registered with it itself, the necessary API
functions are mainly used to allocate resources or to register already created resources with
FMC
4.6. MEMORY AND RESOURCE MANAGEMENT: APACHE POOLS 101
a pool for cleanup. However, some resources can be cleaned up manually before the pool is
destroyed.
Allocate memory
When allocating memory, developers can use two different functions. ap_palloc and
ap_pcalloc both take the pool and the size of the memory needed as arguments and re-
turn a pointer to the memory now registered and available to use. However ap_pcalloc
clears out the memory before returning the pointer.
The function ap_strdup is used to allocate memory from a pool and place a copy of a string
in it that is passed to the function as an argument. ap_strcat is used to initialize a new
allocated string with the concatenation of all string supplied as arguments.
Basically any resource can be registered with a pool by supplying callback functions. Here
the function which is to be called to free or terminate the resource and the parameters are
registered. Upon the end of the lifetime of a pool these functions are called. That way any
type of resource can make use of the pool technology.
For file descriptors and as such sockets, as these are file descriptors, the core offers equiv-
alents to the fopen and fclose commands. These make use of the callback function regis-
tration. They register a function that can be called to close file descriptors when the pool is
destroyed.
Process management
Additionally, a pool can manage processes. These are registered with the pool using the
ap_note_subprocess function if the processes already exist. The better way is to use
ap_spawn_child as that function also automatically registers all pipes and other resources
needed for the process with the pool.
Sub pools
Sub pools are the solution if an existing pool is not suitable for a task that may need a large
amount of memory for a short time. Sub pools are created using the ap_make_sub_pool
function. It needs the parent pool handed over as argument and returns the new pool.
Now this pool can be used to allocate resources relatively independent from the parent
pool. The advantage is that the pool can be cleared (ap_clear_pool) or even destroyed
(ap_destroy_pool) without affecting the parent pool. However when the parent pool the
architecture shown in [fig: Internals: preforking BD]is destroyed, the sub pool is destroyed
automatically.
Appendix A
Multiprocessing Multithreading
Process 1 program
Process 1 program
Thread 1
state: virtual
memory
PC, stack processor state: virtual
PC, stack processor
Thread m
state: virtual
Process n program
PC, stack processor
state: virtual
memory
PC, stack processor
Process n
Imagine a task like a virtual computer — it offers a CPU, memory and I/O. The state of
a task can be found in the processor registers (for example the Program Counter PC) and
the stack (the return addresses, the parameters of the procedure calls and local variables).
The difference between processes and threads, as shown in figure A.1 is the isolation: While
each process has an isolated memory space for its own, all threads (of a process) share one
102
FMC
A.1. UNIX PROCESSES 103
memory space (including the program, of course). Threads are usually bound to processes
which define the resource set shared by the threads. In the following, we will focus on UNIX
processes.
A.1.1 fork()
In Unix processes are created using the system call fork(). The first process of a program is
created upon the start of the program. Using fork() an additional process can be created.
Fork simply creates a copy of the current process including all attributes. Like its parent,
the new process continues with the code that follows the fork() instruction. To distinguish
parent from child, the return value of the fork call can be used. Fork will return 0 to the
newly created process while the parent gets the process id of the child process. Therefore
the fork system call is usually followed by a decision based on fork’s return value.
The example code below is a shortened extract of the make_child procedure of the Apache
2.0 preforking MPM:
(void) ap_update_child_status_from_indexes(slot,
0, SERVER_DEAD,(request_rec *) NULL);
sleep(10);
return -1;
}
if (!pid) {
/* In this case fork returned 0, which
* means this is the new process */
ap_scoreboard_image->parent[slot].pid = pid;
return 0;
Parent Child
allocate resources
for child process
failure success
create process
return -1
(failure)
<0 =0 <0 =0
>0 >0
Error Register Error Parent
child_main child_main
Handling child Handling process
Look at figure A.2 for a petri net version of the source code above. Looking at the transi-
tions at the bottom we see that the program provides three possible cases depending on the
return value of fork(). Some of them are grey because they will never be executed at all,
depending on whether the parent or the child process executes the code.
A.1.2 exec()
When a process wants to execute a different program it can use any of the exec() system
calls. Exec discards all data of the current process except the file descriptor table — depend-
ing of the variant of exec this in– or excludes the environment — and loads and starts the
execution of a new program. It can be combined with a prior fork() to start the program in
a new process running concurrently to the main process.
All variants of exec() serve the same purpose with the exception of the way the command
line arguments are submitted. The different exec calls all expect command line arguments
using different data types which gives flexibility to the programmer. Additionally there are
separate calls that can be used when a new environment is desired for the program that is to
be executed. Otherwise the program will simply use the existing environment of the calling
process.
FMC
A.2. SIGNALS AND ALARMS 105
A.1.4 kill()
The kill() system call is used to send signals to processes. Even though the name implies
that the kill() system call is solely used to kill processes, more is behind it.However, a
signal that is neither caught nor ignored terminates a process which is why this system call
is called kill. The syntax is kill(int pid, int signal). Most signals can be caught or
ignored, however the signal SIGKILL (#9) cannot and will terminate the process in any case.
For more information on signals and the kill() system call look at the next section.
A.2.1 Signals
Signals are a way of inter–process communication (IPC) that works like an interrupt. To send
a signal to a process, a program uses the system call kill(). A process receiving a signal
suspends execution of its program immediately and executes a signal handler procedure
instead. After that it resumes the program. A process can register a signal handler procedure
for each individual signal type. If no signal handler has been registered for the issued signal,
the default handling is to terminate the process.
A signal handler is a procedure which has to be registered for one or many signals and which
is restricted in its capability to execute operating systems calls. Therefore a signal handler
will either react to the signal directly or save necessary information about the signal, so that
the main program can handle it at a later point. The Unix version of Apache uses a very
simple signal handler (see figure 4.8 right–hand side) which just sets flags.
The process can also ignore every signal except SIGKILL which will always terminate the
addressed process. Any signal which is ignored will simply be discarded. Another option is
106 APPENDIX A. OPERATING SYSTEM CONCEPTS FMC
to block a signal, which is similar to ignoring the signal. The signal will then not be relayed
to the process as if it was ignoring it. However any arriving signal is saved and will be
forwarded to the process when the signal is unblocked. Using that feature a process can
prevent a signal from interrupting its execution during important parts of the execution
sequence. Blocking signals at the beginning of a signal handler and reenabling at the end of
its execution is a good way to prevent race conditions of multiple signal handlers executing
concurrently.
Additionally most UNIX flavours allow associating single signal handlers with a set of sig-
nals. Signal sets can be ignored or blocked as a whole set. When a signal handler is responsi-
ble for a set of signals, the parameter of the handling function will supply information about
which signal triggered the handler.
However, signals are a technique that is very limited. Only a fixed range of signals exist.
Apart from information that might be encoded in the choice of a specific signal type there
is no possibility to send additional information to the addressed process. Furthermore the
routines that handle signals are limited in their capabilities. Only a narrow range of system
calls may be executed.
A.2.2 Usage
signal() and kill()
Signal handlers are registered for a specific signal type using the signal() system call.
Signal() expects an integer for the signal type and a function pointer to the signal han-
dling procedure. Generally a signal can be sent using the kill() system call. This function
requires a process id and a signal type as arguments. On most UNIX systems there is also
a command that can be used on the shell named kill. Users can use it to send signals to
running processes. Unix versions of Apache can be restarted or shut down using the kill
command.
A process can address signals to itself using the raise() call. Alarm() instructs the system
to interrupt the process using the SIGALRM signal after a certain time, like an alarm clock.
Therefore the alarm() system call only be used by a process to send a specific signal to itself.
Signal Masks
Sets of signals can be managed easily by using so–called signal masks. A set of signals can
be assigned to one single signal handler or a can be equally ignored or blocked by a process.
A.3 Sockets
The technology called sockets is used to enable high level applications and services to com-
municate using the network hardware and protocol services offered by the operating sys-
tem. This section will provide information about sockets and TCP/IP covering both UDP
FMC
A.3. SOCKETS 107
and TCP. Sockets are also used for other protocols like Netware’s IPX/SPX. The examples in
this section will be restricted to TCP/IP.
TCP/IP offers two types of ports software systems can use to communicate: TCP ports and
UDP Ports.
TCP (Transmission Control Protocol) offers a reliable end–to–end connection using an unre-
liable network. UDP (User Data Protocol) is a connectionless transport protocol used to send
datagram packets. Although unreliable, it is fast because of the small protocol overhead.
Network traffic by design is asynchronous. That means a software system and especially
a service do not have influence on when traffic arrives. Although most operating systems
today engage multitasking technologies, it would be a waste of resources to let each task poll
for incoming events continuously. Instead, the operating system ensures to capture all traffic
of a network interface using asynchronous event technologies such as hardware interrupts.
Tasks use a blocking operating system call to wait for incoming data — blocking means that
the tasks remains suspended until the event it waits for occurs.
A socket can be seen as a mediator between the lower level operating system and the ap-
plication. Roughly the socket is a queue. The higher–level application asks the operating
system to listen for all messages arriving at a certain address, to push them into the socket
queue and to notify the caller after data has arrived. To remain synchronous, operating sys-
tem calls like listen(), accept(), receive(), and recvfrom() are blocking calls. That means the
operating system puts the thread or process to sleep until it receives data. After receiving
data and writing it into the socket queue, the operating system will wake the thread or pro-
cess. The abstract system structure of a set of agents communicating via sockets is shown in
figureA.3.
virtual channels
Relay agency
Figure A.4 shows a detailed view. Sockets are managed by a socket communication service,
which provides the socket API to the communiction partners, i.e. the different application
components. Using this API it is possible to setup and use sockets without any knowledge
about the data structure of sockets and the underlying protocols used for communication.
Host Host
Host
local virtual channels
Communication partner
socket Communication partner Communication partner
identifier
- listen
- create socket
R R - connect - send R R
- bind net address
- accept - receive
to socket
- close
waiting
Socket communication service Socket communication service
requests
local access
socket management
identifier information
static dynamic
information data
SOCKET
SOCKET inbox SOCKET
inbox
net address
buffer
((host):port)
dialog
partners' outbox
net address buffer
(host:port)
local local
socket socket
identifier identifier
virtual
Agent waiting channels Agent initiating
Communication partner
for communication communication
(SERVER) (CLIENT)
- create socket
- create socket - listen
R R - bind net address - connect
- bind inbox net - accept - send R R R - send R
of servers' dial-up - close
address to socket - close - receivce - receivce
socket to socket
server client
dial-up side side
socket connection connection
socket accept socket
connect
The first type, generally referred to as the “dialup socket” or “server socket” or “listen socket”, is
bound to one or more local IP addresses of the machine and a specific port. After a connec-
tion was requested by a remote application, a new socket usually called “connection socket”
is created using the information of the packet origin (IP address and port) and the informa-
tion of the local machine (IP address and port). This socket is then used for communication
via the TCP connection. Figure A.5 shows the different kinds of sockets and the operating
system calls to use and control them. Figure A.6 illustrates the relation between ports and
connection resp. dial–up sockets.
Client Server
ports port
Basically, a listen socket only contains information about the local port and IP address, it is
therefore only used to listen, as the operating system would not know where to send infor-
110 APPENDIX A. OPERATING SYSTEM CONCEPTS FMC
mation. The connection socket contains both origin and local information and can therefore
be used to transmit and receive data.
There may be multiple TCP sockets (one listen socket and an undefined number of connec-
tion sockets) for each port. The operating system decides based on the information of the
communication partner to what socket the received information has to be queued.
UDP Ports do not engage connection oriented techniques or safety mechanisms. Therefore
a UDP socket is created using the local IP address and the port number to be used. Only
one socket can be created for one port as a UDP socket is never associated with a connection
partner. For sending, a host has to supply information about the communication partner;
for receiving, it is optional.
In a multitasking environment multiple threads or processes that each serve a separate com-
munication partner would, unlike in TCP, not have an own socket but share the socket.
When receiving and transmitting they will then supply the address of their communication
partner. As HTTP is based on TCP when used in TCP/IP networks, UDP is not used in
Apache.
A.4 Pipes
In Unix based operating systems the primary means for inter–process communication is the
pipe concept. A pipe is a unidirectional data connection between two processes. A process
uses file descriptors to read from or write to a pipe.
A pipe is designed as a queue offering buffered communication. Therefore the operating
system will receive messages written to the pipe and make them available to entities reading
from the pipe.
The pipe is referenced using file descriptors for both end points. Each endpoint can have
more than one file descriptor associated with it.
Most software systems use the functionality and properties of process creation using
fork(). Fork copies program, memory and the file descriptor table of the parent to the
new child process. When a parent process wants to create a process that it wants to com-
municate with afterwards, it will create a pipe prior to the process creation. When the new
process is spawned, the parent process and all its properties are copied. Therefore the file
descriptors containing references to the pipe are copied as well. Now both processes have
references to the input and the output end of the pipe. If one process only uses the input end
and the other only uses the output end of the pipe, a one-way communication is established
between both processes. As pipes can transmit any data as a stream, a pipe can be used
to transmit data or simply to synchronize operations between both processes. If two-way
communication is desired, two pipes have to be used.
FMC
A.4. PIPES 111
By default, each process has three file descriptors when it is started. These file descriptors
are called STDIN, STDOUT and STDERR. As the names let predict STDIN is the standard
means to receive data, STDOUT is used to output data and STDERR is used to transmit er-
ror messages. By default these file descriptors are all connected to the shell via pipes that
executed the program. A very popular way to establish communication with a child pro-
cess is to simply redirect the STDIN and STDOUT to a pipe that was created by the parent
process prior to spawning the child. However the main advantage is that the newly created
process with its redirected pipes can execute a separate program in its own environment
using exec(). Although exec() resets the process’ memory and loads a new program it
keeps the file descriptor table. This program then inherits the process’ STDIN and STDOUT
which enables communication with the parent process which is then able to interact with
the executed external program.
In Apache this is used for external script interpreters. Figure A.7 shows the static structure
of such a pipe setup after the STDIN and STDOUT of parent and child have been redirected
to a pipe. Another pipe would be needed to establish two-way communication.
A.4.3 Implementation
Pipes are created using the pipe() system call. Pipe() returns an array of file descriptors
of size 2. These array elements are used to point to the input and the output end of the
pipe. After the successful execution of the pipe function, the pipe can be used for input or
output or it can be redirected to any of the standard file descriptors. To achieve that the
standard file descriptor in question has to be closed and the other pipe has to be redirected
to the standard descriptor using the function dup(), which expects a single integer, the file
descriptor identifying the pipe as an argument. Dup actually doubles the output or input to
the standard pipe and the supplied argument. But as the standard pipe was closed before
calling dup, only the new pipe will be used. By closing all file descriptors of the pipe using
close(), the pipe can be closed as well.
A drawback of regular pipes lies in the fact that the communication partners have to share
file descriptors to get access to the pipe. This can only be done by copying processes with
fork().
A named pipe (FIFO) can be accessed via its name and location in the file system. Any
process can get access to a named pipe if the access rights of the named pipe allows it to. A
named pipe is a FIFO–special file and can be created with the mknod() or mkfifo() system
call.
112 APPENDIX A. OPERATING SYSTEM CONCEPTS FMC
! #
"
%
# $
& '
(
!
" #
!
A.5 Longjmp
A.5.1 Concept
Longjump is a programming concept used in C to manipulate the flow of the execution se-
quence. Roughly it can be described as setting a mark by saving the current state of the
processor of a program - in this case a thread or a process. After the state has been saved it
can be restored at a further point in the sequence. To understand this concept it is essential
to know that each process or thread has a stack for subroutine calls. Whenever the processor
executes a subroutine call, it saves the program counter to the stack together with the param-
eters for the procedure, local variables are also put on the stack. At the end of a subroutine,
the processor reads the saved program counter and discards all data put on the stack after
this entry. The program continues with the next statement. The Longjump concept uses the
fact that the stack contains this information.
When calling the function setjmp() the current processor’s registers including the stack
pointer is saved with the mark. Therefore the mark can be used to determine at which posi-
tion the execution continues after setjmp(). When at any following point in the program
execution the longjmp() function is used with the mark as an argument, the processor’s
registers including stack pointer is set to the position where it was before. The program
will then continue as if the setjmp() call returned again. Only the return value of this call
will be different. Therefore the Longjump concept is very similar to using "goto". However
as opposed to longjmp(), "goto" will not restore any of the program’s state. Therefore
longjmp() can be used for jumps without corrupting the call stack. The drawback of using
Longjump is that one has to be aware of the underlying systems technology to fully un-
derstand the concept, and of course that the program becomes more fuzzy as non standard
programming concepts are used. A typical application for Longjumps is exception handling.
FMC
A.5. LONGJMP 113
In this example a single thread or process is considered. At a certain point of the execution
sequence the function sub1() is called. As described above a new element containing in-
formation about the calling routine is put on the stack. During the execution of sub1(),
setjmp() is called to mark the current position of the execution sequence and to save the
state information. Setjmp() returns 0 as it is called to mark the current position. There-
fore the execution continues in the normal branch of the execution sequence. Sub1() then
calls sub2() which will call sub3(). Within sub3() the longjmp() system call is issued.
Longjmp() requires two parameters. The first is the mark, used to define the position to
jump to. The second parameter is the return value that the imaginary setjmp() call will
return when being jumped to using the longjmp() call.
The execution will continue with the instruction following the setjmp call. That will usu-
ally be a decision based on the return value of setjmp to be able to differentiate whether
setjmp() was just initialised or jumped to using longjmp().
The code for the example above that illustrates the use of Longjump and also resemble the
petri net of figure A.8 is:
int sub1()
{
jmp_buf buffer; // we can save a stack point in here
if (setjump(buffer))
{
// when setjmp is jumped to we will end up here
exit(0);
}
else
{
/* when setjmp is called to save the stack
* pointer we will enter this branch */
sub2(buffer); // call any sub function
}
}
114 APPENDIX A. OPERATING SYSTEM CONCEPTS FMC
setjmp:
+ save stack pointer
& program counter
+ ReturnValue := 0
else ReturnValue == 0
call sub2
call sub3
longjmp:
+ restore stack pointer
& program counter
+ ReturnValue := 1
Stack for
active thread
Environment for
main()
Figure A.9: Stack content before calling longjmp() in the example above
One of the major uses for the Longjump concept is exception handling Usually when using
Longjump for exception handling the program will save a mark using setjmp() right at
the beginning of the execution sequence. It is followed by a decision that will branch to the
normal program execution sequence based on the return value of setjmp(). A return value
of 0 indicates that setjmp() was initially called to save a mark and execution is therefore
continued normally. However, if the return value of setjmp() indicates that setjmp()
was jumped to then it usually includes information about the type of the exception that
occurred. The corresponding exception handling routine can therefore be started. The ad-
vantage of such error handling is its centralization and that it can be used from any point in
the program execution sequence as the mark was saved at the lowest point of the subroutine
execution hierarchy. Implementing exception handling this way very much resembles ex-
ception handling in higher level languages like Java and C++. The difference is that in C++
and Java exceptions can be caught in any procedure along the hierarchy of procedure calls.
To implement such behaviour via Longjump multiple marks and additional logic to decide
to which mark to jump to is required.
While exception handling is the major application area for this concept, Longjump can also
be used to maximize performance in skipping multiple subroutine returns. A recursive al-
gorithm for example might need a large amount function calls to compute the final result
within the deepest function of the hierarchy. There the result is known but multiple function
returns are needed to transfer the result back to the function originally triggering the recur-
sion. In this case one single longjump can transfer the result by simply skipping the returns
and restoring the original stack pointer that was current before the recursive algorithm was
started.
116 APPENDIX A. OPERATING SYSTEM CONCEPTS FMC
Longjump can be used to jump within different stacks or even stacks of different threads. In
an environment where multiple stacks exist to keep track of execution sequences - a stack
might have been manually created or multiple threads that all have an own stack exist -
Longjump can be used to set the stack pointer of a thread to a stack that originally does
not belong to that thread. This can be used to implement a scheduling algorithm within a
process. However modeling that kind of behaviour is rather difficult and does have many
pitfalls. As you can imagine using one thread to jump to a stack position of a thread that
is still active can corrupt the second thread’s stack and will likely lead to undetermined
behaviour or crashes.
Even though the Longjump concept is a very powerful tool, it still demands a very detailed
understanding of the operating system concept it is based on. Again different operating
systems differ and might not support the functionality equally. Different flavours of Unix
or Linux handle concurrency differently and some save more information on the stack than
others. The existence of siglongjmp() function that has the same behaviour as longjmp
with the addition of making sure that also the current signalling settings are saved and
restored underlines that fact.
Stack for
Calling longjmp() in a
active thread
situation where the
function containing the
setjmp() instruction has
Outdated Stack already returned will
element, lead to indetermined
possibly corrupt behaviour or system
crashes, as the stack
Outdated Stack not necessarily
element, contains valid
possibly corrupt information for a
Longjump.
Outdated Stack
element,
possibly corrupt Value of buffer
Stack Element
for function
calling sub1() Current Stack
position
Figure A.10: Calling longjmp() can cause errors when used without caution
Additionally certain thread properties, like open file descriptors are simply not stored in the
stack and can therefore be lost when jumping to a different thread. Local variables on the
other hand are stored on the stack. When jumping from a point in the program sequence
that involves any local pointer variables the developer has to make sure that no memory is
still allocated before longjmp() is called to avoid memory leaks, as these pointer variables
FMC
A.5. LONGJMP 117
will not exist at the destination of the jump, but the memory will remain allocated until the
program terminates.
The stack pointer value stored at any point of the programs execution should not be used
any further after the current stack pointer has pointed or still points to a lower stack ele-
ment than the marked one. At such a point the procedure that contained the setjmp call
has already returned. The stack pointer was or is set to a lower level and any intermedi-
ate subsequent procedure call might have overwritten the element the saved pointer value
originally pointed to. Therefore undetermined behaviour or crashes are likely to occur.
In the end Longjump is a very powerful tool when used with great care. On the other hand
the excessive use of longjump can create very confusing execution sequences and procedure
call hierarchies that are hard to model and keep track of. The code will also become harder
to read the more Longjump is used.
Appendix B
Sources
118
FMC
B.1. SIMPLE HTTP SERVER 119
97 # of 1 connection simultanouslyserversocket.listen(1)
98
99 # request-response loop
100 try:
101 while 1:
102 # wait for request and establish connection
103 ( clientsocket, address ) = serversocket.accept()
104 process_request( clientsocket, address )
105 # Close connection
106 clientsocket.close()
107 finally:
108 # Deactivation
109 serversocket.close()
• .
Appendix C
Purpose of this document This quick introduction will give you an idea of what FMC
is all about by presenting you the key concepts starting with a small but smart example.
Following the example, you will learn about FMC’s theoretical background, the notation and
hopefully get a feeling about the way FMC helps to communicate about complex systems.
At the first glance, you might find the example even trivial, but keep in mind that this little
example presented here may be the top–level view of a system being realized by a network
of hundreds of humans and computers running a software built from millions of lines of
codes. It would hardly be possible to efficiently develop such a system without efficient
ways to communicate about it.
Levels of abstraction The example describes different aspects of a travel agency system.
Starting with a top level description of the system, we will shift our focus toward imple-
mentation, while still remaining independent from any concrete software structures. So
121
122 APPENDIX C. FUNDAMENTAL MODELING CONCEPTS (FMC) FMC
don’t expect to see UML class diagrams, which doubtless might be helpful to represent the
low–level structures of software systems.
Reservation Information
system help desk
Travel agency
Travel organization
Travel organization
abstract than anything else we consider to be real. Looking at the system we see components,
artifacts of our mind, which relate to tangible distinguishable physical phenomena, like an
apple, a computer, our family or a TV station — entities which exist in time and space. So
an informational system can be seen as a composition of interacting components called agents.
Each agent serves a well–defined purpose and communicates via channels and shared storages
with other agents. If an agent needs to keep information over time, it has access to at least
one storage to store and retrieve information. If the purpose is not to store, but to transmit
information, the agents are connected via channels.
Agents are drawn as rectangular nodes, whereas locations are symbolized as rounded
nodes. In particular, channels are depicted as small circles and storages are illustrated as
larger circles or rounded nodes. Directed arcs symbolize whether an agent can read or write
information from or to a storage.
Storage access In the example, the arcs directed from the nodes labeled "travel organiza-
tion" to the storage node labeled "travel information" symbolize that the travel organizations
write the travel information. Correspondingly the arc directed from the storage node labeled
"travel information" to the agent node labeled "information help desk" symbolizes that the help
desk reads the travel information. If an agent can modify the contents of a storage regard-
ing its previous contents, it is connected via a pair of opposed bound arcs, called modifying
arcs. The access of the reservation system and the travel organization to the customer data
storage is an example.
Informational systems are dynamic systems. By looking at the channels and locations used
to store, change and transmit information for some time, their behavior can be observed. Petri
nets are used to visualize the behavior of a system on a certain level of abstraction.
Figure C.2 shows a Petri net describing the causal structure of what can be observed on the
channel between the travel agency and one of its customers in our example. Buying a ticket
starts with the customer ordering a ticket. Then the travel agency checks the availability and
in case this step is successful, concurrently a ticket may be issued to the customer and pay-
ment is requested. The customer is expected to issue the payment and when both sides have
acknowledged the receipt of the money respectively the ticket, the transaction is finished.
124 APPENDIX C. FUNDAMENTAL MODELING CONCEPTS (FMC) FMC
order ticket
check
availabilty
else handle
problem
succes
issue request
ticket payment
ackowledge issue
receipt payment
ackowledge
receipt
Building blocks Petri nets describe the causal relationship between the operations, which
are performed by the different agents in the system. Each rectangle is called a transition and
represents a certain type of operation. The transitions are connected via directed arcs with
circular nodes called places. Places can be empty or marked, which is symbolized by a black
token. The behavior of the system can now be simulated by applying the following rule to
the Petri net: Whenever there is a transition with all its input places being marked and all
its output places being unmarked, this transition may fire, meaning the operation associated
with the transition is performed. Afterward all input places of the transition are empty and
all its output places are marked.
So, looking at the Petri net shown in figure C.2, in the beginning only the transition labeled
"order ticket" may fire. This means the first operation in the scenario being described will be
the customer ordering a ticket. Because only the initial marking of a Petri net may be shown
in a printed document, it is necessary to process the net by virtually applying the firing rules
step by step, until you get an understanding of the behavior of the system. This is very easy,
as long as there is only one token running through the net.
Patterns Common patterns are sequences of actions, loops and conflicts. A conflict is given
if multiple transitions are ready to fire, which are connected with at least one common input
place. Because the marking of that input place can not be divided, only one of the transitions
may fire. In many cases, a rule is given to solve the conflict. In those cases predicates labeling
the different arcs will help to decide which transition will fire. For example, different ac-
tions have to be taken depending on the outcome of the availability check. If the check was
successful, the travel agency will issue the ticket and request payment.
Concurrency and Synchronization In our example, issuing the ticket and payment should
be allowed to happen concurrently. Using Petri nets, it is possible to express concurrency by
entering a state where multiple transitions may fire concurrently. In the example, we intro-
duce concurrency by firing the unlabeled transition, which has two output places. After-
ward both transitions, the one labeled "issue ticket" and the one labeled "request payment", are
allowed to fire in any order or even concurrently. The reverse step is synchronization, where
one transition has multiple input places, which all need to be marked before it is ready to
fire.
Looking at dynamic systems, we can observe values at different locations which change
over the time. In our model, agents, which have read and write access to these locations
are responsible for those changes, forming a commonly static structure which is shown in a
block diagram. Petri nets give us a visual description of the agent’s dynamic behavior. To
describe the structure and repertoire of information being passed along channels and placed
in storages, we use entity relationship diagrams.
Business
person
First time Regular
Passenger
customer customer
Private
person
arranges
Travel Reser-
Seat
agency vation
belongs to
Business
organi-
zation
realizes
Travel
organi- Tour Vehicle
zation
follows
starts at
Example Figure C.3 shows an entity relationship diagram representing the structure of the
information, which is found when looking at the storage labeled "reservations" and "customer
data" which both the "reservation system" and the "travel organizations" can access (see figure
C.1). In the middle of the diagram, we see a rounded node labeled "reservations" which
represents the set of all reservations being stored in system. Such a reservation is defined
by a customer booking a certain tour, allocating a certain seat in a certain vehicle. The tour
will follow a certain route, starting at some location and ending at some location. Looking at
the passengers, first time customers are distinguished from regular customers. Independent
from that, passengers can also be partitioned into business persons and private persons. The
FMC
C.4. VALUE RANGE STRUCTURES AND ENTITY RELATIONSHIP DIAGRAMS 127
system also stores information about the organization which has arranged a reservation and
which travel organization realizes which tour.
Building blocks In entity relationship diagrams, round nodes visualize different sets of en-
tities each being of a certain type. The sets in the example are passengers, business men,
tours, vehicles etc. Each of them is defined by a set of attributes according to its type. Most
elements of a set have one or more relations to elements of another or also the same set of el-
ements. For instance, each route has one location to start at and one location to end at. Each
relationship, i.e. each set of relations between two sets of entities being of a certain type, is
represented by a rectangular node connected to the nodes representing the sets of entities
participating in the relationship. So there is one rectangle representing the "start at" relation-
ship and another representing the "ends at" relationship. Annotations beside the rectangle
can be used to specify the predicate by an expression using natural language, which defines
the relationship.
Partitions If one entity node contains multiple sub–nodes, this entity node represents the
union of the entity sets being enclosed. Typically the elements of the union share a common
type, an abstraction characterizing the elements of all subsets. For instance in the example
"first time customers" and "regular customers" define the set of "passengers". But we can also dis-
tinguish "business persons" from "private persons", also the result of another true partitioning
of "passengers". To avoid visual confusion caused by multiple containment nodes crossing
each other, those unrelated partitions are symbolized using a longish triangle.
Further application Entity relationship diagrams may not only be used to visualize the
structure of the information stored in technical systems. They also can help to get some un-
derstanding of new application domains by providing an overview of the relations between
its concepts.
128 APPENDIX C. FUNDAMENTAL MODELING CONCEPTS (FMC) FMC
Show purpose of the system So far, the system has been described on a very abstract level
only reflecting its purpose. The implementation of most components is still undefined. We
see a high-level structure, which also could be explained with a few words. Looking at
the block diagram (figure C.1), we only learn that the customers and interested persons are
expected to be humans. Nothing is said about how the reservation system, the help desk,
the travel organization are implemented, how the stored information looks like, whether
it will be an office with friendly employees answering questions and distributing printed
booklets or an IT system, which can be accessed using the Internet. All this is undefined on
this level of abstraction.
System overview needed for communication Nevertheless, the model shows a very con-
crete structure of the example system. The system structure has been made visual, which
highly improves the efficiency of communication. There is some meaningful structure you
can point to, while talking about it and discussing alternatives. This would be inherently im-
possible if the real system did not exist yet or if the system just looked like a set of technical
low-level devices, that could serve any purpose.
By refining the high-level structure of the system while considering additional requirements,
a hierarchy of models showing the system on lower levels of abstracting is being created.
Again, the system description can be partitioned according to the three fundamental aspects,
that define each dynamic system — compositional structure, behavior and value structures.
Making the relationship between the different models visible using visual containment and
descriptive text, comprehension of the systems is maintained over all levels of abstraction.
Using this approach, it is possible to prevent the fatal multiplication of fuzziness, while
communicating about complex structures without anything to hold on.
A possible implementation of the example system Figure C.4 shows a possible imple-
mentation of the information help desk and the storage holding the travel information. The
storage turns out to be implemented as a collection of database servers, mail and web servers
used by the different travel organizations to publish their documents containing the travel
information. The information help desk contains a set of adapters used to acquire the infor-
mation from the different data sources. The core component of the help desk is the document
builder. It provides the documents assembled from the collected information and from a set
of predefined templates to a web server. Persons being interested in the services from the
travel agency may read the documents from this web server using a web browser. It is not
obvious that the reservation system now has to request travel information from the infor-
mation help desk instead of getting it itself. This is an example for non–strict refinement.
FMC
C.5. LEVELS OF ABSTRACTION 129
Customers
Interested persons
R R
Travel agency
R
Information
HTTP Server help desk
Document
cache
Document builder
Reservation
system Templates
R Travel R R
SQL IMAP4 HTTP
information
Reser-
vations Database
IMAP HTTP
management
Server Server
system
Customer
data docu- docu- XML
ments R ments R documents
Travel organization
Travel organization
We could continue to describe the dynamics and value structures on that level, afterward
refining or introducing new aspects one more time and so on. We won’t do this here. When
to stop this iteration cycle depends on the purpose of your activities: Maybe you are you go-
ing to create a high–level understanding of the systems’ purpose, maybe you are discussing
design alternatives of the systems architecture or estimating costs, whatever.
• Didactic system models serving the communication among humans versus analytical
models serving the methodologically derivation of consequences
FMC provides the concepts to create and visualize didactic models. This enables people to
share a common understanding of system structure and its purpose. Therefore FMC helps
to reduce costs and risks in the handling of complex systems.
Appendix D
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible
for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It
complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the software does. But this License is not limited to software
manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend
this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can
be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a
licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under
copyright law.
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of
the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall
directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any
mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial,
philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that
says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to
be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections
then there are none.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the
Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the
general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels)
131
132 APPENDIX D. GNU FREE DOCUMENTATION LICENSE FMC
generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for
automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format
whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LATEX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification.
Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and
edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and
the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this
License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text
near the most prominent appearance of the work’s title, preceding the beginning of the body of the text.
A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses
following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowl-
edgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means
that it remains a section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These
Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large
enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and
the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify
you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the
Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the
actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable
Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general
network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document,
free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque
copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the
last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to
give them a chance to provide you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which
should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the
original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five),
unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modified Version, as the publisher.
E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice.
I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one
stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section.
You may omit a network location for a work that was published at least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the
substance and tone of each of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are
not considered part of the section titles.
M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the
list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various
parties — for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of
a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of
the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by
you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one,
on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for
modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified,
and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all
sections Entitled "Endorsements."
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies
of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this
License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of
this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
134 APPENDIX D. GNU FREE DOCUMENTATION LICENSE FMC
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage
or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the
compilation’s users beyond what the individual works permit. When the Document is included an aggregate, this License does not apply
to the other works in the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the
entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic
equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Re-
placing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of
some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License,
and all the license notices in the Document, and any Warrany Disclaimers, provided that you also include the original English version of
this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original
version of this License or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title
(section 1) will typically require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt
to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties
remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such
new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version
number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
Bibliography
[2] The Apache Software Foundation. Apache http server project. Web site.
http://httpd.apache.org.
[3] B. Gröne, A. Knöpfel, and R. Kugel. The apache modelling project. Web site.
http://www.fmc-modeling.org/projects/apache.
[4] Douc MacEachern Lincoln Stein. Writing Apache Modules with Perl and C. O’Reilly, 1999.
[5] Douglas Schmidt, Michael Stal, Hans Rohnert, and Frank Buschmann. Pattern–oriented
Software Architecture: Patterns for Concurrent and Networked Objects, volume 2. John Wiley
and Sons, Ltd, 2000.
135
Glossary
CGI Common Gateway Interface, one of the first techniques of enhancing web servers
and creating dynamic and multifunctional web pages. Using cgi the web server is en-
abled to execute applications or basic programming instructions which usually generate
an HTML document. If used with HTML forms, the client transmits data with param-
eters encoded in the URL or with the message body of a "POST" request. Today other
techniques like PHP, servlets, java server pages (JSP) or ASP are widely used alternatives
for cgi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Filter A Filter processes data by reading from an input and writing to an output channel.
A sequence of filters where one filter processes the output of another is called filter chain.
Apache 2 uses an input filter chain to process the HTTP request and an output filter chain
to process the HTTP response. Modules can dynamically register filters for both chains.
40
Handler A handler is a callback procedure registered for a certain event. When the event
occurs, the event dispatcher will call all handlers in a specific order. In Apache, the
events during request processing are marked by hooks. Apache Modules register Han-
dlers for certain hooks, for example for "URI translation". . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Header Most messages transmitted in the internet consist of a header and a body. A
header usually contains protocol information and meta information about the payload
transmitted in the body. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Hook A hook is a processing step where the handlers (callback procedures) registered for
this Hook will be called. In Apache 1.3, the hooks were defined by the module API,
while Apache 2 allows adding new hooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
136
FMC
BIBLIOGRAPHY 137
HTML Hyper Text Mark–up Language is a document format which is used for documents
in the World Wide Web. An HTML document basically contains the text, with formatting
hints, information about the text and references to futher components of the document
like images that should be included when displaying the page. A web server’s task is
to deliver the HTML page and supply the resources referenced within the HTML page
when requested. HTML documents can be static files or created dynamically by the
server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Job Queue Server Model A Multitasking Server Model using a job queue to provide com-
munication between dedicated listener task(s) and a pool of idle worker tasks. A listener
task waits for a request and puts a job entry holding the connection information into the
queue. An idle worker task waits until it gets a job entry and uses the connection infor-
mation to get and process the request.
An idle worker queue advises the listener to only accept requests if there is at least one
idle worker in this queue. Else the request could not be processed immediately. . . . . 73
Leader–Follower Server Model A Multitasking Server Model using a pool of tasks chang-
ing their roles: One Listener Task is waiting for requests (the leader) while idle tasks (the
followers) are waiting to become the new listener. After the listener task has received
a request, it changes its role and becomes worker processing the request. One idle task
now becomes the Listener. After the work has been done, the worker becomes an idle
task. The Preforking Model (see p. 58) is a Leader–Follower Model . . . . . . . . . . . . . . . . . . 79
Port In TCP/IP, a Port is part of an address. Each node on the Internet can be identified
by its IP address. Using TCP or UDP, a node can offer 65536 ports for each of these
protocols on each IP address, and assign ports to services or a client applications. A TCP
or UDP packet contains both origin and destination addresses including the ports, so
the operating system can therefore identify the application or service as its receiver. . 9
Preforking Server Model A Multitasking Server Model using a pool of Unix Processes
(see Leader–Follower above). In contrast to the straightforward solution where a master
server process waits for a request and creates a child server process to handle an incom-
ing request, the Apache master server process creates a bunch of processes that wait for
the requests themselves. The master server is responsible for keeping the number of idle
servers within a given limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
138 BIBLIOGRAPHY FMC
Process In the Multitasking context, a process is a task with high protection and isolation
against other processes. The memory areas of two processes are separated completely.
Processes can be executed using different user IDs resulting in different access rights.
Sub–processes share file handles with their parent process. . . . . . . . . . . . . . . . . . . . . . . . . . 56
Thread In the Multitasking context, a thread is a task with low protection and isolation
against other threads because they share memory and file handles. Usually, a thread
runs in the context of a process. A crashing thread will not only die but cause all other
threads of the same context to die as well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
URL A Uniform Resource locator is an address for a resource on the Internet. Resource
in this context can be a document, a file or dynamic information. URL usually contains
information about the protocol which has to be used to access the resource, the address
of the node the resource resides on and the location of the resource within the target
node. Furthermore it can contain additional information like parameters especially for
dynamically generated information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Index
139
140 INDEX FMC
transition, 125
type of entities, 127
UDP, 8
Uniform Resource Identifier (URI), 11
URI, 11
URI translation, 94
Compositional Structures
FMC Block diagrams - Reference Sheet
Client Client
Bidirectional
Channel
Web Browser Web Browser
Request / Response
Agent R R Channel
HTTP
HTTP Server
Part of
Human Storage
Agent Sockets TCP/IP Communication Service
Storage
Files
R R
Child Server 1
Documents
Admin
Master
Server
Global R
Files
Read Structure
Access Variance
Server Child Server Activity
Mutex Place
Generation State Table
Read / Write Shared
Access Storage
FMC Block diagrams show the compositional structures as a composition of collaborating system components.
There are active system components called agents and passive system components called locations. Each agent processes
information and thus serves a well-defined purpose. Therefore an agent stores information in storages and communicates via
channels or shared storages with other agents. Channels and storages are (virtual) locations where information can be
observed.
basic elements
unidirectional Depicts the data flow direction between an active and a passive
connection system component.
bidirectional Like unidirectional connection but data flow is not strictly from one
connection component to another one. Its direction is unspecified.
common structures
write access Agent A has write access to storage S. In case of writing all
A S information stored in S is overwritten.
read / write access Agent A has modifying access to storage S. That means that some
A S (modifying access) particular information of S can be changed.
unidirectional
A1 A2 Information can only be passed from agent A1 to agent A2.
communication channel
RES
advanced
means of transfer
Role
transmission
1:n Relation
Reification
sends
is sent by
transaction
Relation
Name
HTTP
HTTP Request HTTP Response
Message
1:1 Relation
Cardinality
Range
Entity
Header
FMC Enitity Relationship Diagrams are used to depict value range structures or topics as mathematical structures.
Value range structures describe observable value structures at locations within the system whereas topic diagrams allow a
much wider usage in order to cover all correlations between interesting points.
basic elements
further elements
structure entity Is used to create an entity from a structure (entities and relations).
common structures
i .. n j .. m
Each element of E1 occurs i to n times in the relation with E2 while
E1 E2 n:m relation each element of E2 occurs j to m times in the relation.
advanced
D
A B A
E X Y
C B
A B
C
1) n ary relation (e.g., ternary) 2) reification 3) orthogonal partitioning
1) Sometimes it is necessary to correlate more than two entities to each other via n ary relations. The example shows a
ternary relation.
2) Elements of a relation constitute the elements of a new entity, which in turn can participate in other relations. The example
shows the relation C being reificated.
3) Partitioning of entity E into the entities X, Y and additional, independent partitioning of entity E into the entities A, B.
Imagine for instance entity E as "Human Being". Then entity X may stand for "Man", Y for "Woman" and thereof
independently entities A and B could mean "European" and "Non-European".
Dynamic Structures
Petri nets (1/2) - Basic Reference Sheet FMC
FMC diagrams for dynamic structures are based on transition-place Petri nets. They are used to express system behaviour
over time depicting causal dependencies. So they clarify how a system is working and how communication takes place
between different agents.
Here only the basic notational elements are covered whereas the rest is located on another - more advanced - reference sheet
(2/2).
basic elements
further elements
common structures
Agent A Agent B
C1 Cn
T1 T1 Tn
T1 T1
^ T2 3) case (conflict)
T2 = T1 Tn
T2
Tn
C1
Tn T1
C2
1) Defines that transition T1 fires first, followed by transition T2, followed by transition T3 … .
2) Means that transitions have no causal ordering. The transitions T1, …, Tn are concurrent, the firing of T1, ... , Tn has no
special order.
3) Is used to choose one transition among others. Only one of the transitions T1, …, Tn will fire, depending on the conditions
C1, …, Cn associated to the arcs.
4) Is used to repeat the firing. Transition T1 will be repeated as long as condition C1 is fulfilled. Often C2 is not mentioned as
it is assumed to be “else” .
5) Whenever a swimlane divider is crossed communication takes place. Upon this structure all possible communication
types can be expressed (synchronous, asynchronous etc.).
Unmarked Transition
Responsible Place initialize
Agent
mutex
initialize
accept mutex
receive request
release mutex
Directed
Arc
process
request
static page dynamic page
requested requested
Condition
return page
render page
Multiplicity
Swimlane Transition-Bordered
Divider Sub Net
Dynamic Structures
Petri nets (2/2) - Advanced Reference Sheet FMC
FMC diagrams for dynamic structures are based on transition-place Petri nets. They are used to express system behaviour
over time depicting causal dependencies. So they clarify how a system is working and how communication takes place
between different agents.
Here only the advanced notational elements are covered whereas the rest is located on the basic reference sheet (1/2).
extended elements
Places which can hold multiple tokens but not an infinite number
multi-token place are indicated as enlarged places with an annotation specifying the
capacity (n>1). Places with an infinite capacity are indicated by a
cap. n (cap. ) double circle.
8
recursion elements
enclosing recursive program part Each recursive diagram shows the following characteristics:
program
part 1) There is an entry point of the recursion (place a). Initially
called by the enclosing program part it is called afterwards
a several times by the recursive program part itself.
2) Transition R represents the reaching of the end-condition,
which is always present to finish the recursion by determining
the function value of at least one argument without calling the
c S recursive part again.
end-condition reached else
3) Stack places like b, c and d are always input places for
transitions that additionally have a return place (e) as input.
b S R All the stack places together constitute the return stack
which is used to store information about return positions.
4) A return place (e) is always input place for at least two
d S transitions that also have stack places (b, c, d) as input
places.
5) Be aware that the return stack's only task is to guide the
recursion handling. In addition all the necessary data stack
e R modifications like PUSH, POP and TOP have to be done to
remember values such as intermediate results.
transform to
postscript
printer is printer is no
postscript postscript printer
printer
Multi-Token
create
Place printerspecific data
send data to
printer
arg := TOP
PUSH (arg - 1)
Stack Place
PUSH (arg) S
res := POP
Stack Place arg := POP - 2
S TOP := 1
PUSH (res)
PUSH (arg)
res := POP
S
return res
res := POP
TOP := res + TOP
R
Return Place
1 if arg = 0
Fib(arg) = 1 if arg = 1
Fib (arg - 1) + Fib (arg - 2) if arg > 1, arg ∈ N