Week8 1 Odp
Week8 1 Odp
Week 8
Webservers
Outline
●
Resource location
– URLs
●
HTTP & CGI
●
Apache Configuration
●
Notes about the lab
Resource Identification
●
There is an incredible amount of information
on the internet.
– So how do you find what you need? How do you
come back to it later?
●
You need some way to refer back to resources
(files, pages, etc.).
●
URLs, URNs, URIs.
Resource Identification
●
URIs (Uniform Resource Identifiers) are a
general way of pointing to information.
●
They can be broadly broken down into two
main groups:
– URNs (Uniform Resource Names) – What the file
is, but not where it is.
●
The DOI on a journal article would be a good example.
– URLs (Uniform Resource Locators) – Where it is,
possibly (but not always), what it is.
●
This is what you'll be dealing with most of the time.
URLs
●
A URL is mostly concerned with the location of
a resource (hence the name).
●
It tells you how to reach the object in question.
●
It absolutely must have a protocol and a
hostname.
– i.e. Contact this machine, speaking this language.
– A web address is an example of a URL.
– http://matrix.senecac.on.ca
path
URLs
●
If you do not include the optional elements, the
host you contact will use default values.
– Note that I didn't specify a port, but still got to a
webpage.
●
Because the default for http is port 80.
●
In some cases, machines will move things onto non-
standard ports (you did this with ssh in OPS235, and will do
so with http in this lab).
●
They may not make the default point there, so if someone
doesn't know the port to use (e.g. an unauthorized user), it
is harder for them to get access to your data.
●
http://matrix.senecac.on.ca:80/~peter.callaghan
– Also try with a different port.
URLs
●
Also note that the path is always relative.
– Hosts do not make their root available on the internet.
●
Not for long anyway...
– It can be a complete path, or a partial one.
●
If partial, you may get default resources (index.html), or
refused access.
●
http://matrix.senecac.on.ca/~peter.callaghan/index.html
●
http://matrix.senecac.on.ca/~peter.callaghan/files/OPS335
●
http://matrix.senecac.on.ca/~peter.callaghan/files/OPS335/assignment1.2.html
URLs
●
The structure of a url is:
●
protocol://hostname:port/path
– With :port and /path being optional.
●
http://matrix.senecac.on.ca:80/~peter.callaghan/files/OPS335/assignment1.2.html
host port
path
protocol
URLs
●
There are quite a number of different protocols
available:
– ftp – file transfer protocol
– file – plain file
– http and https – hypertext transfer protocol (normal
and secure)
– mailto – mail.
– And more.
HTTP
●
We're mostly concerned here with HTTP
(HyperText Transfer Protocol), the general
language for web content.
●
On the server side, this requires that the host
be listening and ready to respond to requests
for web content.
– A web server, e.g. apache is what we will use for
this.
HTTP
●
In the simple case, a user asks for a plain html
file (like my schedule page), and your server
finds it and gives it to them.
●
In a more complicated case, the user might
ask for more involved content, like a video, but
the general principle is the same.
●
Even when serving dynamic content (that can
change depending on what the user does),
this doesn't change much.
CGI
●
CGI (Common Gateway Interface) is a
specification that allows the server to receive
extra data from the requesting machine and
change what it sends back depending on that
data.
●
This allows you to take advantage of the
capabilities of scripting languages (bash, perl,
python, etc.) to make your content different
depending on what the user has done.
– e.g. put their name in a page if they are logged in.
– Once you know your scripting this is surprisingly
easy, just a formalized series of responses before
you output your data.
Apache
●
For most of the life of the internet, apache has
been the most common webserver:
http://news.netcraft.com/archives/category/we
b-server-survey/
●
It is reliable, open-source, commonly-updated,
etc.
●
Like postfix, it works pretty much from the
moment you install it (don't worry, it isn't as
finicky as postfix can be about configuration
parameters).
Apache - Configuration
●
Configuring apache is pretty similar to
configuring the other servers we've seen so
far.
– Find and edit the configuration parameters held in
a file, then restart it.
– By default, the file for apache is
/etc/httpd/conf/httpd.conf
– Like postfix's configuration file, it is well
documented, explaining what some common
parameters do.
●
Refer to the online docs for the full list.
Apache - Configuration
●
The configuration file is broken into three
general sections
– Global Environment – How apache as a whole will
act
– Main Server – How your default server will act
– Virtual Hosts – How other 'virtual' servers will act.
●
Yes, you can make apache act like several different web
servers on one machine. We'll look at this at the
beginning of next week.
Apache - Configuration
●
ServerRoot – The directory apache operates
out of. As far as it is concerned, this is the
entire part of the system it has access to.
– /etc/httpd
●
Timeout – How long to wait (in seconds)
before considering a partial conversation over.
●
Listen – What port (and ip address) to listen
for requests on
– 80 by default.
Apache - Configuration
●
User and Group allow you to change the
username and groupname apache runs under.
Apache - Configuration
●
In section 2
●
ServerAdmin – the email to send issues with
the webserver to.
●
ServerName – The name the webserver will
respond to (must match a server in your DNS).
●
DocumentRoot – The top directory to serve
documents from.
●
DirectoryIndex – The default file to look for if
the user doesn't specify one. Usually
index.html
Apache - Configuration
●
<Directory “path”> statements
– These allow you to configure how individual
directories will act
– Most of this relies on some of the numerous
parameters we don’t look at.
– Still handy to be aware that you can configure
different directories with different capabilities.
Apache - Configuration
●
A number of options control how apache
creates logs of activity.
●
LogLevel – how much to write to the logs.
– Ranges from debug (for lots) to emerg. (for only
very critical things).
●
AccessLog - Where to record what got
requested by clients
– On some versions this is TransferLog
●
ErrorLog – Where to write errors about the
service.
Apache - Configuration
●
The third section deals with configuring virtual
servers.
●
You can make your one installation of apache
act as host for multiple, independent websites.
●
We'll deal with this next lecture.
– After you have had a chance to set up a basic
server in the lab.
Notes about the Lab
●
The webserver is probably where you are going to
see most activity.
●
Everyone wants to contact your webserver.
●
In the lab you'll see several ways of balancing
load between several servers.
– This way you can co-ordinate between several
moderately powerful servers instead of having a
supercomputer just for your web-access.
– The iptables rules listed in the lab to do this work, but
there is an error in iptables saving them. It will drop
'–-packet 0' when it saves, breaking your rules. The
lab will remind you to remove them at the end.
Summary
●
This week we took a brief look at webservers.
●
URLs used to find/access data
●
Configuring apache to provide resources over
the web.
●
Warnings about an issue you will encounter in
the lab.