0% found this document useful (0 votes)
4 views

M1-introductiontotheinternet

The document is a syllabus for a course on Web Technologies, covering topics such as the Internet's history, protocols, web browsers, and servers. It aims to provide students with a foundational understanding of the World Wide Web and practical skills in creating a WordPress website. Key references include Robert W. Sebesta's book on programming the web.

Uploaded by

ashasimon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

M1-introductiontotheinternet

The document is a syllabus for a course on Web Technologies, covering topics such as the Internet's history, protocols, web browsers, and servers. It aims to provide students with a foundational understanding of the World Wide Web and practical skills in creating a WordPress website. Key references include Robert W. Sebesta's book on programming the web.

Uploaded by

ashasimon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Introduction to the Internet

Module 1
KTU Course: CS368 Web Technologies
Instructor: Jinesh Jose
Department of Computer Science & Engineering
Government Engineering College Idukki
Syllabus
• Introduction to the Internet: The World Wide
Web, Web Browsers, Web Servers, Uniform
Resource Locators,
• Multipurpose Internet Mail Extensions,
• The Hypertext Transfer Protocol.
• Common Gateway Interface(CGI),
• Content Management System – Basics
• Case Study: Apache Server, WordPress.
Expected outcome
• Understand the basics of www
• Create a wordpress web site, which allows
logged in users to publish their blogs!

Primary reference
Robert W Sebesta, Programming the World
Wide Web, 7/e, Pearson Education Inc., 2014.
History of Internet
The Internet is a huge collection of computers connected in a
communications network

•Origins
–ARPAnet - late 1960s and early 1970s
• Network reliability
• For ARPA-funded research organizations
•BITnet(Because It’s Time Network), CSnet - late 1970s & early 1980s
• email and file transfer for other institutions
–NSFnet - 1986
• Originally for non-DOD funded places
• Funded by National Science Foundation (NSF)
• Initially connected five supercomputer centers
• By 1990, it had replaced ARPAnet for non-military uses
• Soon became the network for all (by the early 1990s)
–NSFnet eventually became known as the Internet
Internet (cntd)
•What the Internet is:
•A world-wide network of computer networks
•At the lowest level, since 1982, all connections use
TCP/IP
•TCP/IP hides the differences among devices connected
to the Internet

Internet is actually a network of networks,


rather than a network of computers
Internet timeline

Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon
Postel, Larry G. Roberts, Stephen Wolff. A Brief History of the Internet. Internet Society.
http://www.isoc.org/internet/history/brief.shtml
Internet Protocols
• Internet Protocol (IP) Addresses
– Every node has a unique numeric address
– Form: 32-bit binary number
• New standard, IPv6, has 128 bits (1998)
• Organizations are assigned groups of IPs for
their computers
Cntd..
• Domain names
– Form: host-name.domain-names
– First domain is the smallest; last is the largest
– Last domain specifies the type of organization
– Fully qualified domain name - the host name and
all of the domain names
– DNS servers - convert fully qualified domain
names to IPs
Sample domain name
movies.comedy.marxbros.com
• Here, movies is the hostname and comedy is
movies’s local domain, which is a part of
marxbros’s domain, which is a part of the com
domain.
• The hostname and all of the domain names
are together called a fully qualified domain
name
Domain Name System
Proliferation of protocols
Problem: By the mid-1980s, several different
protocols had been invented and were being
used on the Internet, all with different user
interfaces (Telnet, FTP, Usenet, mailto etc)

• Restricted the growth of the Internet.


• Users were required to learn all the different
interfaces
Client & Server
• Clients and Servers are programs that
communicate with each other over the Internet
• A Server runs continuously, waiting to be
contacted by a Client
– Each Server provides certain services
– Services include providing web pages
• A Client will send a message to a Server
requesting the service provided by that server
– The client will usually provide some information,
parameters, with the request
The World Wide Web
• A possible solution to the proliferation of different
protocols being used on the Internet
• Origins
– Tim Berners-Lee at CERN proposed the Web in 1989
• Purpose: to allow scientists to have access to many databases of
scientific work through their own computers
• Document form: hypertext - which is text with embedded
links to text in other documents to allow non sequential
browsing of textual
• material
– Pages? Documents? Resources?
• We’ll call them documents
– Hypermedia – more than just text – images, sound, etc.
Web or Internet
• The Internet is a collection of computers and
other devices connected by equipment that
allows them to communicate with each other
• The Web is a collection of software and
protocols that has been installed on most, if
not all, of the computers on the Internet
• Web uses one of the protocols, http, that runs
on the Internet--there are several others
(telnet, mailto, etc.)
Web Browsers
• Browsers are clients – initiate requests, servers react
(response)
• Documents provided by servers on the Web are
requested by browsers, which are programs running
on client machines
• Mosaic - NCSA (Univ. of Illinois), in early 1993
– First to use a GUI, led to explosion of Web use
– Initially for X-Windows, under UNIX, but was ported to
other platforms by late 1993
• Most requests are for existing documents, using
HyperText Transfer Protocol (HTTP)
– But some requests are for program execution, with the
output being returned as a document
Desktop Browsers
Web Servers
• Provide responses to browser requests, either
existing documents or dynamically built
documents
• Browser-server connection is now maintained
through more than one request-response
cycle
• All communications between browsers and
servers use Hypertext Transfer Protocol
(HTTP)
Web Servers
• Web servers run as background processes in
the operating system
– Monitor a communications port on the host,
accepting HTTP messages when they appear
• All current Web servers came from either
1. The original from CERN
2. The second one, from NCSA
Web Server
Web Server operation details
• Web servers have two main directories:
1. Document root (servable documents)
2. Server root (server system software)
• Document root is accessed indirectly by clients
– Its actual location is set by the server configuration file
– Requests are mapped to the actual location
• Web servers now support other Internet protocols
Web Server – Virtual document tree
• Virtual document trees - Many servers allow part of
the servable document collection to be stored outside
the directory at the document root. The secondary
areas from which documents can be served are called
virtual document trees
Web Server – Virtual host
• Virtual hosts - servers can support more than one site
on a computer, potentially reducing the cost of each
site and making their maintenance more convenient.
Such secondary hosts are called virtual hosts
Web Server – Proxy servers
Proxy servers can serve documents that are in the
document root of other machines on the Web
.
Apache
•Apache (open source, fast, reliable)
•Apache began as the NCSA server, httpd, with some
added features
• There are three configuration files in an Apache server: httpd.conf,
srm.conf, and access.conf
–Directives (operation control):
ServerName
ServerRoot
ServerAdmin,
DocumentRoot
Alias
Redirect
DirectoryIndex
UserDir
IIS
•IIS
-Provided by Microsoft for Windows Platform
-Operation is maintained through a program with a
GUI interface

• Changes made through a window-based


management program, named the IIS snap-in,
which controls both IIS and ftp
Uniform Resource Identifier
• Web resources need names/identifiers – Uniform Resource
Identifiers (URIs)
– Resource can reside anywhere on the Internet
• URIs are a somewhat abstract notion
– A pointer to a resource to which request methods can be applied to
generate potentially different responses
• A request method is eg. fetching or changing the object
• Instance: http://www.foo.com/index.html
– Protocol, server, resource
• Most popular form of a URI is the Uniform Resource Locator
(URL)
– Differences between URI and URL are beyond scope
– RFC 2396
Uniform Resource Locators
• <scheme>://<server-domain-name>/<pathmane>

– <scheme> which protocol to use


• http: in general
• file: which tells the client document is in a local machine
• ftp: file transfer protocol
– <server-domain-name> identifies the server system
• i.e. www.iitm.ac.in
– <pathname> tells the server where to find the file
• http://www.gecidukki.ac.in/index.html
URLs
• Host name may include a port number, as in
gecidukki.ac.in:80 (80 is the default, so this is silly)
• URLs cannot include spaces or any of a collection of
other special characters (semicolons, colons, ...)
• The doc path may be abbreviated as a partial path
– The rest is furnished by the server configuration
• If the doc path ends with a slash, it means it is a
directory
URL Encoding
• URLs can never have embedded spaces. Also,
there is a collection of special characters,
including semicolons, colons, and ampersands (&),
that cannot appear in a URL.
• To include a space or one of the disallowed
special characters, the character must be coded as
a percent sign (%) followed by the two-digit
hexadecimal ASCII code for the character.
• When converting a string to URL we use URL
encoding to apply above rules
What happens when you click on a
hyper link?
• Determine URL and extract domain name.
• Use the name server to get IP address (DNS)
• Make a TCP connect to port 80
• And send a request for a web page once the
server has accepted to connection.
• The server send the file and releases the TCP
connection
• The client displays the document.
Multipurpose Internet Mail Extensions
(MIME)
• Originally developed for email
• Used to specify to the browser the form of a file
returned by the server (attached by the server to
the beginning of the document)
• Type specifications
– Form:
type/subtype
– Examples: text/plain, text/html, image/gif, image/jpeg
MIME
• Server gets type from the requested file name’s
suffix (.html implies text/html)
• Browser gets the type explicitly from the server
• Experimental types
• Subtype begins with x-
• e.g., video/x-msvideo
• Experimental types require the server to send a
helper application or plug-in so the browser can
deal with the file
The Hypertext Transfer Protocol
• Originally proposed by Tim-Berners Lee
• Webs application layer protocol
• At present Specifications are maintained by IETF
HTTP Working Group (http://httpwg.org/specs/)
The Hypertext Transfer Protocol
• The protocol used by ALL Web
communications
• Protocol for client/server communication
– The heart of the Web
– Very simple request/response protocol
• Client sends request message, server replies with response message
– Stateless
– Relies on URI naming mechanism
• Three versions have been used
– 09/1.0 – very close to Berners-Lee’s original
• RFC 1945 (original RFC is now expired)
– 1.1 – developed to enhance performance, caching, compression
• RFC 2068
– 2.0 released in 2015
– 3.0 specification development is in progress
• HTTP: hypertext HTTP
transfer protocol
• Web’s application layer
protocol
• client/server model
HTTP request
o client: browser that PC running
requests, receives, Explorer HTTP response
“displays” Web objects
o server: Web server sends
objects in response to
requests HTTP request Server
• HTTP 1.0: RFC 1945 HTTP response running
Apache Web
• HTTP 1.1: RFC 2068 server
• HTTP 2.0: RFC 7450
Mac running
Navigator
HTTP Basic features
• HTTP is connectionless: The HTTP client, i.e., a browser initiates an
HTTP request and after a request is made, the client disconnects from
the server and waits for a response. The server processes the request
and re-establishes the connection with the client to send a response
back.
• HTTP is media independent: It means, any type of data can be sent by
HTTP as long as both the client and the server know how to handle the
data content. It is required for the client as well as the server to specify
the content type using appropriate MIME-type.
• HTTP is stateless: As mentioned above, HTTP is connectionless and it is
a direct result of HTTP being a stateless protocol. The server and client
are aware of each other only during a current request. Afterwards,
both of them forget about each other. Due to this nature of the
protocol, neither the client nor the browser can retain information
between different requests across the web pages.
HTTP PDU
• HTTP is a text based protocol
HTTP Request Format
request-line ( request request-URI HTTP-version)
headers (0 or more)
<blank line>
body (only for POST request)

• First type of HTTP message: requests


– Client browsers construct and send message
• Typical HTTP request:
– GET http://www.foo.edu/index.html HTTP/1.0
HTTP Response format
status-line (HTTP-version response-code
response-phrase)
headers (0 or more)
<blank line>
body
• Second type of HTTP message: response
– Web servers construct and send response messages
• Typical HTTP response:
– HTTP/1.0 301 Moved Permanently
Location: http://www.foo.com/cs/index.html
HTTP Headers
HTTP header fields provide required information about
the request or response, or about the object sent in
the message body. There are four types of HTTP
message headers:
• General-header: These header fields have general
applicability for both request and response messages.
• Request-header: These header fields have applicability
only for request messages.
• Response-header: These header fields have
applicability only for response messages.
• Entity-header: These header fields define
meta-information about the entity-body or, if no body
is present, about the resource identified by the
request.
HTTP Headers
• Both requests and responses can contain a variable number of header
fields
• Common request fields:
– Accept: text/plain
– Accept: text/*
– If-Modified_since: date
• Common response fields:
– Content-length: 488
– Content-type: text/html
• Form:
HTTP Response
Status line
Response header fields
blank line
Response body
• Status line format:
HTTP version status code explanation
• Example: HTTP/1.1 200 OK
(Current version is 1.1)
• The header field, Content-type, is required
HTTP Response codes
• 1xx – Informational – request received,
processing
• 2xx – Success – action received, understood,
accepted
• 3xx – Redirection – further action necessary
• 4xx – Client Error – bad syntax or cannot be
fulfilled
• 5xx – Server Error – server failed
HTTP Response example
HTTP methods
• GET - Fetch a document
• POST - Execute the document, using the data in
body
• HEAD - Fetch just the header of the document
• PUT - Store a new document on the server
• DELETE - Remove a document from the server
• OPTIONS – retrieve information about available
options
• TRACE – loopback request message
• CONNECT – for use by caches
GET Method
• A GET request retrieves data from a web server by specifying
parameters in the URL portion of the request.

GET Request

GET /hello.htm HTTP/1.1


User-Agent: Mozilla/4.0 Response
(compatible; MSIE5.01;
Windows NT) Host: HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
www.tutorialspoint.com Server: Apache/2.2.14 (Win32) Last-Modified: Wed, 22
Accept-Language: en-us Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00"
Accept-Encoding: gzip, deflate Vary: Authorization,Accept Accept-Ranges: bytes
Connection: Keep-Alive Content-Length: 88 Content-Type: text/html
Connection: Closed

<html> <body> <h1>Hello, World!</h1> </body>


</html>
HEAD Method
The HEAD method is functionally similar to GET, except that the
server replies with a response line and headers, but no
entity-body.

HEAD Request

HEAD /hello.htm HTTP/1.1


User-Agent: Mozilla/4.0 Response
(compatible; MSIE5.01;
Windows NT) Host: HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
www.tutorialspoint.com Server: Apache/2.2.14 (Win32) Last-Modified: Wed, 22
Accept-Language: en-us Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00"
Accept-Encoding: gzip, deflate Vary: Authorization,Accept Accept-Ranges: bytes
Connection: Keep-Alive Content-Length: 88 Content-Type: text/html
Connection: Closed
POST Method
The POST method is used when you want to send some data to
the server, for example, file update, form data, etc
POST Request
Response
POST /cgi-bin/process.cgi
HTTP/1.1 User-Agent: HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
Mozilla/4.0 (compatible; Server: Apache/2.2.14 (Win32) Last-Modified: Wed, 22
MSIE5.01; Windows NT) Host: Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00"
www.tutorialspoint.com Vary: Authorization,Accept Accept-Ranges: bytes
Content-Type: text/xml; Content-Length: 88 Content-Type: text/html
charset=utf-8 Content-Length: Connection: Closed
88 Accept-Language: en-us
Accept-Encoding: gzip, deflate <html> <body> <h1>Request Processed
Connection: Keep-Alive Successfully</h1> </body> </html>

<?xml version="1.0"
encoding="utf-8"?> <string
xmlns="http://clearforest.com/
">string</string>
PUT Method
The PUT method is used to request the server to store the
included entity-body at a location specified by the given UR
PUT Request
Response
PUT /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 HTTP/1.1 201 Created Date: Mon, 27 Jul 2009 12:28:53
(compatible; MSIE5.01; GMT Server: Apache/2.2.14 (Win32) Content-type:
Windows NT) Host: text/html Content-length: 30 Connection: Closed
www.tutorialspoint.com
Accept-Language: en-us <html> <body> <h1>The file was created.</h1> </body>
Connection: Keep-Alive </html>
Content-type: text/html
Content-Length: 182

<html> <body> <h1>Hello,


World!</h1> </body> </html>
DELETE Method
The DELETE method is used to request the server to delete a file
at a location specified by the given URL.
DELETE Request
Response
DELETE /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
(compatible; MSIE5.01; Server: Apache/2.2.14 (Win32) Content-type: text/html
Windows NT) Host: Content-length: 30 Connection: Closed
www.tutorialspoint.com
Accept-Language: en-us <html> <body> <h1>URL deleted.</h1> </body>
Connection: Keep-Alive </html>
CONNECT Method
The CONNECT method is used by the client to establish a
network connection to a web server over HTTP
CONNECT Request
Response
CONNECT
www.tutorialspoint.com HTTP/1.1 200 Connection established Date: Mon, 27 Jul
HTTP/1.1 User-Agent: 2009 12:28:53 GMT Server: Apache/2.2.14 (Win32)
Mozilla/4.0 (compatible;
MSIE5.01; Windows NT)
OPTIONS Method
The OPTIONS method is used by the client to find out the HTTP
methods and other options supported by a web server. The
client can specify a URL for the OPTIONS method, or an
asterisk (*) to refer to the entire server.

OPTIONS Request
Response
OPTIONS * HTTP/1.1
User-Agent: Mozilla/4.0 HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
(compatible; MSIE5.01; Server: Apache/2.2.14 (Win32)
Windows NT) Allow: GET,HEAD,POST,OPTIONS,TRACE
Content-Type: httpd/unix-directory
TRACE Method
The TRACE method is used to echo the contents of an HTTP
Request back to the requester which can be used for
debugging purpose at the time of development.

TRACE Request
Response
TRACE / HTTP/1.1
Host: www.tutorialspoint.com HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT
User-Agent: Mozilla/4.0 Server: Apache/2.2.14 (Win32)
(compatible; MSIE5.01; Connection: close
Windows NT) Content-Type: message/http Content-Length: 39

TRACE / HTTP/1.1
Host: www.tutorialspoint.com
User-Agent: Mozilla/4.0 (compatible; MSIE5.01;
Windows NT)
HTTP/1.0 Network Interaction
• Clients make requests to port 80 on servers
– Uses DNS to resolve server name
• Clients make separate TCP connection for each URL
– Some browsers open multiple TCP connections
• Netscape default = 4
• Server returns HTML page
– Many types of servers with a variety of implementations
– Apache is the most widely used
• Freely available in source form
• Client parses page
– Requests embedded objects
HTTP/1.1 Performance Enhancements
• HTTP/1.0 is a “stop and wait” protocol
– Separate TCP connection for each file
• Connect setup and tear down is incurred for each file
• Inefficient use of packets
• Server must maintain many connections in TIME_WAIT
• Mogul and Padmanabahn studied these issues in
’95
– Resulted in HTTP/1.1 specification focused on
performance enhancements
• Persistent connections
• Pipelining
• Enhanced caching options
• Support for compression
Persistent Connections and Pipelining
• Persistent connections
– Use the same TCP connection(s) for transfer of multiple
files
– Reduces packet traffic significantly
– May or may not increase performance from client
perspective
• Load on server increases
• Pipelining
– Pack as much data into a packet as possible
– Requires length field(s) within header
– May or may not reduce packet traffic or increase
performance
• Page structure is critical
What HTTP/1.1 problems does HTTP/2
address?
• Multiplexing: Multiple asynchronous HTTP requests over a
single TCP connection.
Server sends its responses in the same order that the
requests were received — so the entire connection remains
first-in-first-out and HOL blocking(delay of one response will
delay other too) can occur (As per 1.1).

In HTTP pipelining parallel requests were made


asynchronously but server responses synchronously.
HTTP/2 solves this issue by introducing multiplexing, in
which multiple requests and responses both are made
asynchronously over a single TCP connection which indeed
removes the HOL blocking issue.
What HTTP/1.1 problems does HTTP/2
address?
• Multiple TCP Connections for Same Domain: In
HTTP/1.1 there was need for multiple TCP connections
to same domain to make multiple HTTP requests. But
in HTTP/2 only one TCP connection per domain is
allowed and all HTTP requests are made through that
TCP connection asynchronously using multiplexing.
• TCP Connection Close Time: In HTTP/1.1 a TCP
connection is closed as soon as the HTTP requests is
finished. But in HTTP/2 the TCP connection can be live
for a longer period of time.
• Header Compression: In HTTP/1.1 there was no
header compression but in HTTP/2 there is header
compression. Which decreases the latency further.
HTTP/2
• Multiplexing: Multiple asynchronous HTTP
requests over a single TCP connection.
• Server Push: Multiple responses for single request
• Header Compression: Compress HTTP headers
along with content.
• Request prioritization: While making multiple
HTTP requests to a same domain they can be
prioritized.
• Binary Protocol: HTTP/2 is binary protocol
whereas HTTP/1.1 is text protocol.
HTTP 3
• A future backwards-incompatible revision to HTTP
• Semantic backwards compatibility
• Firm limits on protocol elements (sizes, etc.)
• Data-aware header encoding
• Security / encryption requirements
• Multiple metadata buckets / labels
• Push as an extension?
• Binary encoding of headers w/ data type
awareness
• EXTENDED_SETTINGS
Common Gateway Interface(CGI)
• Markup languages cannot be used to specify
computations, interactions with users, or to provide
access to databases
• CGI is a common way to provide for these needs, by
allowing browsers to request the execution of
server-resident software
• CGI is just an interface between browsers and servers
• An HTTP request to run a CGI program specifies a
program, rather than a document
CGI
Servers can recognize such requests in two ways:

1. By the location of the requested file (special


subdirectories for such files, - cgi-bin)

2. A server can be configured to recognize


executable files by their file name extensions (like *.cgi)

A CGI program can produce a complete HTTP


response, or just the URL of an existing document
CGI

A CGI script is any program that runs on a web server.

CGI defines a standard way in which information may be


passed to and from the browser and server.

Any program or script that can process information according


to the CGI specification can, in theory, be used to code a CGI
script
CGI
• CGI scripts can exist in many forms -- depending upon what the server
supports.
• CGI scripts can be compiled programs or batch files or any executable
entity. For simplicity we will use the term script for all CGI entities.
• Typically CGI scripts are written in:
• Perl scripts ,C/C++ programs ,Unix Scripts
• CGI scripts therefore have to be written (and maybe compiled) and
checked for errors before they are run on the server.
• CGI can be called and run in a variety of ways on the server.
• The 2 most common ways of running a CGI script are:
• From an HTML Form -- the ACTION attribute of the form specifies the CGI
script to be run.
• Direct URL reference -- A CGI script can be run directly by giving the URL
explicitly in HTML
Where to place CGI programs in Web
Server?
• Normally placed in a special directory known
by the server to contain executable programs.

• In most environments, this directory is


named cgi-bin.

In the case of xampp it is


X:\xampp\cgi-bin
How to send input from the
user/server to a CGI program?
• There are two basic ways in which data are
passed from a server to the cgi program.
• The first is through environment variables; the
second is through the standard input
file, STDIN.
Input - Environment Variables
• Environment variables are global variables
set by the server that are then inherited
by the cgi program. The names of these
variables are fixed, and the cgi program
must access them through those assigned
names. The list of environment variables..
• HTTP_USER_AGENT SERVER_NAME
• QUERY_STRING SERVER_PORT
• HTTP_ACCEPT SERVER_PROTOCOL
• PATH_INFO REMOTE_ADDR
• DOCUMENT_ROOT PATH
• PATH_TRANSLATED GATEWAY_INTERFACE
• REQUEST_METHOD SCRIPT_NAME
• SERVER_SOFTWARE REMOTE_HOST
Input - Environment Variables
• Environment variables are global variables
set by the server that are then inherited
by the cgi program. The names of these
variables are fixed, and the cgi program
must access them through those assigned
names. The list of environment variables..
• HTTP_USER_AGENT SERVER_NAME
• QUERY_STRING SERVER_PORT
• HTTP_ACCEPT SERVER_PROTOCOL
• PATH_INFO REMOTE_ADDR
• DOCUMENT_ROOT PATH
• PATH_TRANSLATED GATEWAY_INTERFACE
• REQUEST_METHOD SCRIPT_NAME
• SERVER_SOFTWARE REMOTE_HOST
Input - INPUT
• If the method used by a Form is POST, you can
then read the data and process it accordingly.
The data you read from STDIN must be parsed
into attribute/value pairs.
• This will require that you: split the data by
ampersand (&) into attribute=value pairs
• if working in Perl, you may wish to place the
attribute=value pairs into an associative array
by splitting on =translate pluses (+) to spaces
• translate special characters in hex to their
regular character form
CGI - Processing the data
• Once you have parsed any input from STDIN,
you are ready to process it and any data
received through an environment variable.

• At this point, you are in a conventional


programming context and your program can
do virtually anything a conventional program
written in the language you are using can do.
Sending data back to the user/client
• cgi program can do so by writing the data
to STDOUT.
• Generate print statements as if the data were
being sent to a terminal or to a printer.
• Generated content should be proper HTML
• The server header lines are the following:

Status (just the numeric return code followed by the brief text explanation, since the server will
add the server version)
Content-type, in the usual MIME form
Location, (optional) a URL to be followed and returned to the client
blank line (CRLF)
After the header lines, the cgi program generates its data (HTML?)
CGI - Samples
#!/usr/bin/perl -wT
print "Content-type: text/html\n\n";
print "<html><head><title>Hello
World</title></head>\n";
print "<body>\n";
print "<h2>Hello, world!</h2>\n";
print "</body></html>\n";
CGI - Samples
#!/usr/local/bin/perl
print "200 ok\n"; print "content-type: text/html\n\n";
print "<HTML>\n";
print "<HEAD>\n";
print "<TITLE>echo cgi env. vars.</TITLE>\n";
print "<H2>Echo CGI Environment Variables</H2>\n";
print "</HEAD>\n";
print "<BODY>\n";
print "<HR>\n";
print "<H3>Environment Variables</H3>\n";
print "<UL>\n";
foreach $key (keys %ENV)
{ print "<LI>$key = $ENV{$key}\n"; }
print "</UL>\n";
print "</BODY>\n";
print "</HTML>\n";
CGI - Form
<FORM action = "/cgi-bin/hello_get.cgi" method
= "GET">
First Name: <input type = "text" name =
"first_name"> <br>
Last Name: <input type = "text" name =
"last_name">
<input type = "submit" value = "Submit">
</FORM>
#!/usr/bin/perl

CGI - hello_get.cgi
local ($buffer, @pairs, $pair, $name, $value, %FORM);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;

if ($ENV{'REQUEST_METHOD'} eq "POST") {
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
} else {
$buffer = $ENV{'QUERY_STRING'};
}

# Split information into name/value pairs


@pairs = split(/&/, $buffer);

foreach $pair (@pairs) {


($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%(..)/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}

$first_name = $FORM{first_name};
$last_name = $FORM{last_name};

print "Content-type:text/html\r\n\r\n";
print "<html>";
print "<head>";
print "<title>Hello - Second CGI Program</title>";
print "</head>";
print "<body>";
print "<h2>Hello $first_name $last_name - Second CGI Program</h2>";
print "</body>";
print "</html>";
Content Management System
• The Content Management System (CMS) is a
software which stores all the data such as
text, photos, music, documents, etc. and is
made available on your website.
• It helps in editing, publishing and modifying
the content of the website.
Content Management System – Basics
• A CMS allows non-technical users to make
changes to an existing website with little or no
training.
• Primarily a Web-site maintenance tool for
non-technical administrators.
• Typically requires an experienced coder to set
up and add features
CMS
• A CMS facilitates:
• Content creation
• Content control
• Editing
• Maintenance functions
CMS
• Designed for users with little or no knowledge
of programming languages or markup
languages
• Tools to create and manage content with
relative ease of use.
• Most systems use a database to store content,
metadata, and/or artifacts that might be
needed by the system.
CMS – Key features
• Automated templates
• Easily editable content
• Scalable feature sets
• Web standards upgrades
• Workflow management
• Delegation
• Document management
• Content virtualization
CMS – Key features
• Automated templates
• Easily editable content
• Scalable feature sets
• Web standards upgrades
• Workflow management
• Delegation
• Document management
• Content virtualization
Key features
Automated templates
• Create standard output templates (usually
HTML and XML) that can be automatically
applied to new and existing content
• Allows the appearance of all content to be
changed from one central place.
Key features
Easily editable content
• Once content is separated from the visual
presentation of a site, it usually becomes
much easier and quicker to edit and
manipulate.
• Most WCMS software includes WYSIWYG
editing tools allowing non-technical
individuals to create and edit content.
Key features
Scalable feature sets
• Most WCMS software includes plug-ins or modules
that can be easily installed to extend an existing
site's functionality.

Web standards upgrades


• Active WCMS software usually receives regular
updates that include new feature sets and keep the
system up to current web standards.
Key features
Workflow management
• Workflow is the process of creating cycles of
sequential and parallel tasks that must be
accomplished in the CMS. E.g:
– A content creator submits a story
– The copy editor cleans it up
– The editor-in-chief approves it.
– Only then is it published.
Key features
Delegation
• Allows various user groups to have limited
privileges over specific content on the
website.
• Spreads out the responsibility of content
management.
Key features
Document management
• Provides a means of managing the life cycle of
a document:
– initial creation
– revisions
– publication
– archive
– document destruction.
Case Study: WordPress
• Student presentation
Case Study: Apache Server
• Student presentation

You might also like